imsc
imsc

Reputation: 7840

Normalization in multiple-linear regression

I have a data set for which I would like build a multiple linear regression model. In order to compare different independent variable I normalize them by their standard deviation. I used sklearn.linear_model for this. I thought that this normalization would not effect the coefficient of determination, i.e., R2 value of the prediction; Only the parameters of the estimator would be different. I got this expected result while using LinearRegression, however the results are different when I use ElasticNet.

I am wondering if my assumption that R2 value is unchanged during normalization is valid or not. If it is not valid, is there another way to achieve what I want with being able to relatively compare the importance of variables?

import numpy as np
from sklearn.linear_model import ElasticNet, LinearRegression
from sklearn import datasets

# Load the data
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target
# Standardize data
X1 = X/X.std(0)

regrLinear = LinearRegression(normalize=False)
regrLinear.fit(X,y)

regrLinear.score(X,y)
0.51774942541329372

regrLinear.fit(X1,y)
regrLinear.score(X1,y)
0.51774942541329372

regrLinear = LinearRegression(normalize=True)
regrLinear.fit(X,y)
regrLinear.score(X,y)
0.51774942541329372

regrEN=ElasticNet(normalize=False)    
regrEN.fit(X,y)
regrEN.score(X,y)
0.00883477003833

regrEN.fit(X1,y)
regrEN.score(X1,y)
0.48426155538537963

regrEN=ElasticNet(normalize=True)
regrEN.fit(X,y)
regrEN.score(X,y)
0.008834770038326667

Upvotes: 3

Views: 5115

Answers (1)

user1669710
user1669710

Reputation: 234

regrEN = ElasticNet(normalize=True)
regrEN.fit(X,y)
print regrEN.score(X,y)
0.00883477003833


regrEN.fit(X1,y)
print regrEN.score(X1,y)
0.00883477003833

I get them to be the same. I wonder how your script is running with regr.score; may be it is printing something else from code that you didn't include in your example?

Upvotes: 1

Related Questions