Reputation: 355
I generate a simple linear model in which X (dimension D) variables come from multi-normal with 0 covariance. Only the first 10 variables have true coefficients of 1, the rest have coefficients 0. Hence, theoretically, the ridge regression results should be the true coefficients divided by (1+C), where C is the penalty constant.
import numpy as np
from sklearn import linear_model
def generate_data(n):
d = 100
w = np.zeros(d)
for i in range(0,10):
w[i] = 1.0
trainx = np.random.normal(size=(n,d))
e = np.random.normal(size=(n))
trainy = np.dot(trainx, w) + e
return trainx, trainy
Then I use:
n = 200
x,y = generate_data(n)
regr = linear_model.Ridge(alpha=4,normalize=True)
regr.fit(x, y)
print(regr.coef_[0:20])
Under normalize = True, I get the first 10 coefficients to be somewhere 20% (i.e. 1/(1+4)) of the true value of 1. When normalize = False, I get the first 10 coefficients to be around 1, which are the same results as a simple linear regression model. Moreover, since I generate the data to be mean = 0 and std = 1, normalize = True shouldn't do anything as the data is already "normalized". Can someone explain to me what is going on here? Thanks!
Upvotes: 1
Views: 7746
Reputation: 1
If you do normalize=True, every feature column is divided by its L2 norm, in other words, magnitude of every feature column is diminished, which causes the estimated coefficients to be larger (βX should be more or less constant; the smaller X, the larger β). When coefficients are larger, greater L2 penalty is imposed. The function thus places more focus on L2 penalty rather than the linear part (Xβ). The estimates of coefficients from the linear part, as a result, is not so accurate compared to pure linear regression.
By contrast, if normalize=False, X is bigger, β is smaller. Given the same alpha, L2 penalty is marginal. More focus is on linear part - the result is close to a pure linear regression.
Upvotes: 0
Reputation: 169
It's important to understand that normalizing and standardizing are not the same and both cannot be done at the same time. You can either normalize or standardize.
Often Standardizing refers to transforming the data so that it has 0 mean and unit (1) variance. E.g. can be achieved by removing the mean and dividing by the standard deviation. In this case, this would be feature (column) wise.
Commonly Normalizing refers to transforming the data values to a range between 0 and 1. E.g. can be achieved by dividing by the length of the vector. But that doesn't mean that the mean is going to be 0 and the variance 1.
After generating trainx, trainy
they're not not normalized yet. Maybe print it to see your results.
normalize=True
, trainx
will be normalized by subtracting the mean and dividing by the l2-norm (according to sklearn). normalize=False
, trainx
will remain as is. Upvotes: 2