villerpa
villerpa

Reputation: 1

How to implement regularization

My task was to implement model parameter tuning using stochastic gradient descent. Below is my function implementation code. However, I would like to add any regularization.

def gradient(X, y, w, batch, alpha):
    gradients = []
    error = np.mean((np.dot(X[batch], w) - y[batch]) * 2)
    for dim in range(X.shape[1]):
        if dim == X.shape[1] - 1:
            gradients.append(error * np.mean(X[batch, dim]))
        else:
            gradients.append(error * np.mean(X[batch, dim]) + 40 * w[dim])
    return w - alpha * np.array(gradients)

def MBGD(X, y, batch_size = 64, n_iter = 10000, alpha = 10e-9):
    i = 0
    X = np.column_stack((X, [1] * len(X)))
    w = np.ones(X.shape[1])
    while i < n_iter:
        batch = np.random.randint(0, X.shape[0], batch_size)
        w = gradient(X, y, w, batch, (1/(i+1)) * alpha / np.sqrt(i + 1))
        i += 1
    return w

I tried using the preprocessing library from sklearn, but its functions do not implement normal regularization. How can I do this?

Upvotes: -1

Views: 163

Answers (1)

Yazeed Alnumay
Yazeed Alnumay

Reputation: 195

One of the most common, and easiest to implement, regularization methods is L_2 regularization, which attempts to minimize the sum of squares of weights. So now the modified loss is loss + c*(w1^2+w2^2+...), and taking the derivative would give us loss_gradient + 2*c*w. Therefore, we can modify the update rule to the following:

return w - alpha * (np.array(gradients) + c*w)

Where c is a scaling hyperparameter we need to tune. It determines the influence of our regularization, the higher it is, the more our model aggressively suppresses large weights. Without knowing much about your task, a decent initial guess would be c=1e-3 or c=1e-4

Upvotes: 1

Related Questions