Linear model prediction is inconsistent

Question

I have some bunch of data:

df_shuffled = shuffle(df, random_state=123)
X = scale(df_shuffled[df_shuffled.columns[:-1]])
y = df_shuffled["cnt"]

Then I learned simple linear model:

from sklearn.linear_model import LinearRegression
regr = LinearRegression()
regr.fit(X, y)

I know that I should not use learning sample to verify model, but I just want to chek if I understand how it works or not.

I have good predictions, using the model:

regr.predict(X)[:5]
array([ 5454.26166397,  3943.78784705,  2125.27231678,  2967.02153671,
    4474.29945607])

This is pretty close to the original data:

y[:5]
488    6421
421    3389
91     2252
300    3747
177    4708
Name: cnt, dtype: int64

Also, I have coefs:

list(zip(df.columns, regr.coef_))
[('season', 570.86663333510262),
 ('yr', 1021.9670828146905),
 ('mnth', -141.30042168132388),
 ('holiday', -86.757534933339258),
 ('weekday', 137.22544688027938),
 ('workingday', 56.39322955869352),
 ('weathersit', -330.23017254975974),
 ('temp', 367.45598306317618),
 ('atemp', 585.57493105545359),
 ('hum', -145.60889630046199),
 ('windspeed(mph)', 12457254171589.174),
 ('windspeed(ms)', -12457254171787.625)]

As we know, we can make predictions, using learned model like this: y=Xw, where y is predicted value vector, X is a data matrix, and w is a coef vector(regr.coef_). But whis does not works!

np.dot(X, regr.coef_)[:5]
array([  949.90689164,  -560.56692528, -2379.08245555, -1537.33323562,
     -30.05531626])

This is completely different from what we have from the .predict method. Why? I don't uderstand...

Linear model prediction is inconsistent

Answers (1)

Related Questions