softcomp
softcomp

Reputation: 41

Linear Regression - Not correct output

I have a database of two columns["A", "B"] where "A" is the input variable and "B" is the target variable. All values are in integers.

My code:

X.shape
>>(2540, 1)

y.shape
>>(2540, 1)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)

import numpy as np
from sklearn.model_selection import train_test_split
np.random.rand(4)
X_train, X_test, y_train, y_test  = train_test_split(X,y,test_size = 0.2)

Linear Regression from Sklearn

regr = LinearRegression(fit_intercept=True)
regr.fit(X_train, y_train)  

print ('Coefficients: ', regr.coef_)
print ('Intercept: ',regr.intercept_)          
>>Coefficients:  [[43.95569425]]
>>Intercept:  [100.68681298]

I got R2 value of 0.93

The last record in X_train is 3687 and the corresponding y_train value is 212.220001

I used the last record for prediction, like

regr.predict([[3687]] )
>>array([161825.22279211])

I do not understand What is happening, I excepted the predicted value will be around 212.

But, The predicted value is 161825

Could you please explain what is the reason, thanks

Upvotes: 1

Views: 295

Answers (1)

Poe Dator
Poe Dator

Reputation: 4912

perhaps you need to pass your test data through the scaler before feeding to the regression. try reg.predict(scaler.transform([3687])

Upvotes: 2

Related Questions