python機器學習——隨機梯度下降

上一篇我們實現了使用梯度下降法的自適應線性神經元，這個方法會使用所有的訓練樣本來對權重向量進行更新，也可以稱之為批量梯度下降（batch gradient descent）。假設現在我們數據集中擁有大量的樣本，比如百萬條樣本，那麼如果我們現在使用批量梯度下降來訓練模型，每更新一次權重向量，我們都要使用百萬條樣本，訓練時間很長，效率很低，我們能不能找到一種方法，既能使用梯度下降法，但是又不要每次更新權重都要使用到所有的樣本，於是隨機梯度下降法（stochastic gradient descent）便被提出來了。

隨機梯度下降法可以只用一個訓練樣本來對權重向量進行更新：
\[ \eta(y^i-\phi(z^i))x^i \]
這種方法比批量梯度下降法收斂的更快，因為它可以更加頻繁的更新權重向量，並且使用當個樣本來更新權重，相比於使用全部的樣本來更新更具有隨機性，有助於算法避免陷入到局部最小值，使用這個方法的要注意在選取樣本進行更新時一定要隨機選取，每次迭代前都要打亂所有的樣本順序，保證訓練的隨機性，並且在訓練時的學習率也不是固定不變的，可以隨着迭代次數的增加，學習率逐漸減小，這種方法可以有助於算法收斂。

現在我們有了使用全部樣本的批量梯度下降法，也有了使用單個樣本的隨機梯度下降法，那麼一種折中的方法，稱為最小批學習（mini-batch learning），它每次使用一部分訓練樣本來更新權重向量。

接下來我們實現使用隨機梯度下降法的Adaline

from numpy.random import seed

class AdalineSGD(object):
     """ADAptive LInear NEuron classifier.
 
     Parameters
     ----------
     eta:float
         Learning rate(between 0.0 and 1.0
     n_iter:int
         Passes over the training dataset.
 
     Attributes
     ----------
     w_: 1d-array
         weights after fitting.
     errors_: list
         Number of miscalssifications in every epoch.
     shuffle:bool(default: True)
         Shuffle training data every epoch
         if True to prevent cycles.
     random_state: int(default: None)
         Set random state for shuffling
         and initalizing the weights.
 
     """
 
     def __init__(self, eta=0.01, n_iter=10, shuffle=True, random_state=None):
         self.eta = eta
         self.n_iter = n_iter
         self.w_initialized = False
         self.shuffle = shuffle
         if random_state:
             seed(random_state)
 
     def fit(self, X, y):
         """Fit training data.
 
         :param X:{array-like}, shape=[n_samples, n_features]
         :param y: array-like, shape=[n_samples]
         :return:
         self:object
 
         """
 
         self._initialize_weights(X.shape[1])
         self.cost_ = []
 
         for i in range(self.n_iter):
             if self.shuffle:
                 X, y = self._shuffle(X, y)
             cost = []
             for xi, target in zip(X, y):
                 cost.append(self._update_weights(xi, target))
             avg_cost = sum(cost)/len(y)
             self.cost_.append(avg_cost)
         return self
     
     def partial_fit(self, X, y):
         """Fit training data without reinitializing the weights."""
         if not self.w_initialized:
             self._initialize_weights(X.shape[1])
         if y.ravel().shape[0] > 1:
             for xi, target in zip(X, y):
                 self._update_weights(xi, target)
         else:
             self._update_weights(X, y)
         return self
     
     def _shuffle(self, X, y):
         """Shuffle training data"""
         r = np.random.permutation(len(y))
         return X[r], y[r]
     
     def _initialize_weights(self, m):
         """Initialize weights to zeros"""
         self.w_ = np.zeros(1 + m)
         self.w_initialized = True
     
     def _update_weights(self, xi, target):
         """Apply Adaline learning rule to update the weights"""
         output = self.net_input(xi)
         error = (target - output)
         self.w_[1:] += self.eta * xi.dot(error)
         self.w_[0] += self.eta * error
         cost = 0.5 * error ** 2
         return cost
     
     def net_input(self, X):
         """Calculate net input"""
         return np.dot(X, self.w_[1:]) + self.w_[0]
     
     def activation(self, X):
         """Computer linear activation"""
         return self.net_input(X)
     
     def predict(self, X):
         """Return class label after unit step"""
         return np.where(self.activation(X) >= 0.0, 1, -1)

其中_shuffle方法中，調用numpy.random中的permutation函數得到0-100的一個隨機序列，然後這個序列作為特徵矩陣和類別向量的下標，就可以起到打亂樣本順序的功能。

現在開始訓練

ada = AdalineSGD(n_iter=15, eta=0.01, random_state=1)
 ada.fit(X_std, y)

畫出分界圖和訓練曲線圖

plot_decision_region(X_std, y, classifier=ada)
 plt.title('Adaline - Stochastic Gradient Desent')
 plt.xlabel('sepal length [standardized]')
 plt.ylabel('petal length [standardized]')
 plt.legend(loc = 'upper left')
 plt.show()
 plt.plot(range(1, len(ada.cost_) + 1), ada.cost_, marker='o')
 plt.xlabel('Epochs')
 plt.ylabel('Average Cost')
 plt.show()

從上圖可以看出，平均損失下降很快，在大概第15次迭代后，分界線和使用批量梯度下降的Adaline分界線很類似。

本站聲明:網站內容來源於博客園,如有侵權,請聯繫我們,我們將及時處理

【其他文章推薦】

※USB CONNECTOR掌控什麼技術要點? 帶您認識其相關發展及效能

※評比前十大台北網頁設計、台北網站設計公司知名案例作品心得分享

※智慧手機時代的來臨，RWD網頁設計已成為網頁設計推薦首選

※評比南投搬家公司費用收費行情懶人包大公開

※幫你省時又省力,新北清潔一流服務好口碑

Orignal From: python機器學習——隨機梯度下降

網路資訊

搜尋此網誌