Reputation: 57
I am trying to improve my grasp of linear regression/multiple linear regression. I saw this video on YouTube where he used a regression tool in excel to perform linear regression on a set of data.
https://www.youtube.com/watch?v=HgfHefwK7VQ&list=PLo8L7S3J29iOX0pvRqAgLDDdwobNWqG9C&index=21&t=0s
His final answer using a prediction for A, B, and C as dependent variables was 45149.21
Cost was the independent variable
This is the method I've been using to try and replicate his results
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
# create linear regression object
lm = LinearRegression()
# develop a model using these variables as predictor variables
X = df[['A Made', 'B Made', 'C Made']]
Y = df['Cost']
# Fit the linear model using the three above-mentioned variables.
lm.fit(X , Y)
# value of the intercept
intercept = lm.intercept_
# values of the coefficients
coef = lm.coef_.tolist()
# final estimated linear model
Z = intercept + (coef[0] * 1200) + (coef[1] * 800) + (coef[2] * 1000)
The predicted value spit out is
Z = 10606.098714826765
intercept = 35108.59711204488
coefficient (list) = [2.072061216849437, 4.153422708041111, 4.796887088174573]
The actual data in question
data = {
'Month':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19],
'Cost':[44439,43936,44464,41533,46343,44922,43203,43000,40967,48582,45003,44303,42070,44353,45968,47781,43202,44074,44610],
'A Made':[515,929,800,979,1165,651,847,942,630,1113,1086,843,500,813,1190,1200,731,1089,786],
'B Made':[541,692,710,685,1147,939,755,908,738,1175,1075,640,752,989,823,1108,590,607,513],
'C Made':[928,711,824,758,635,901,580,589,682,1050,984,828,708,804,904,1120,1065,1132,839]
}
df = pd.DataFrame(data)
I expected the predicted value to be close to that 44000 value. What am I doing wrong?
EDIT: relieved to find the process was correct. Upon examining it again, the intercept printed out a -2 value. Then I made some adjustments where I assigned an intercept value and it is back to where it should be.
THANK YOU to all who answered. Greatly appreciated!
Upvotes: 0
Views: 146
Reputation: 2119
Z = 45714.69582687167
That's what I get by running your code, which is close to 44000
And changed the import to
from sklearn.linear_model import LinearRegression
Upvotes: 1
Reputation: 1344
Do it again, your process is correct. You do not need to manually extract coefficients and intercept.
x_test = [[1200, 800, 1000]]
y_predict = lm.predict(x_test)
output
array([[45714.69582687]])
Btw, fix from sklearn.linear_model import LinearRegression
Upvotes: 1
Reputation: 18377
I just tried your code and got this when priting Z: 45714.69582687167
the only thing I changed is the import: from sklearn.linear_model import LinearRegression()
to from sklearn.linear_model import LinearRegression
Upvotes: 1