BusterTheFatCat
BusterTheFatCat

Reputation: 57

What am I doing wrong with this multiple linear regression

I am trying to improve my grasp of linear regression/multiple linear regression. I saw this video on YouTube where he used a regression tool in excel to perform linear regression on a set of data.

https://www.youtube.com/watch?v=HgfHefwK7VQ&list=PLo8L7S3J29iOX0pvRqAgLDDdwobNWqG9C&index=21&t=0s

His final answer using a prediction for A, B, and C as dependent variables was 45149.21

Cost was the independent variable

This is the method I've been using to try and replicate his results

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

# create linear regression object
lm = LinearRegression()

# develop a model using these variables as predictor variables
X = df[['A Made', 'B Made', 'C Made']]   
Y = df['Cost']

# Fit the linear model using the three above-mentioned variables.
lm.fit(X , Y)

# value of the intercept
intercept = lm.intercept_

# values of the coefficients
coef = lm.coef_.tolist()

# final estimated linear model
Z = intercept + (coef[0] * 1200) + (coef[1] * 800) + (coef[2] * 1000)

The predicted value spit out is

Z = 10606.098714826765
intercept = 35108.59711204488
coefficient (list) = [2.072061216849437, 4.153422708041111, 4.796887088174573]

The actual data in question

data = {

    'Month':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19],
    'Cost':[44439,43936,44464,41533,46343,44922,43203,43000,40967,48582,45003,44303,42070,44353,45968,47781,43202,44074,44610],
    'A Made':[515,929,800,979,1165,651,847,942,630,1113,1086,843,500,813,1190,1200,731,1089,786],
    'B Made':[541,692,710,685,1147,939,755,908,738,1175,1075,640,752,989,823,1108,590,607,513],
    'C Made':[928,711,824,758,635,901,580,589,682,1050,984,828,708,804,904,1120,1065,1132,839]

}

df = pd.DataFrame(data)

I expected the predicted value to be close to that 44000 value. What am I doing wrong?

EDIT: relieved to find the process was correct. Upon examining it again, the intercept printed out a -2 value. Then I made some adjustments where I assigned an intercept value and it is back to where it should be.

THANK YOU to all who answered. Greatly appreciated!

Upvotes: 0

Views: 146

Answers (3)

Péter Leéh
Péter Leéh

Reputation: 2119

Z = 45714.69582687167

That's what I get by running your code, which is close to 44000

And changed the import to

from sklearn.linear_model import LinearRegression

Upvotes: 1

Mr_U4913
Mr_U4913

Reputation: 1344

Do it again, your process is correct. You do not need to manually extract coefficients and intercept.

x_test = [[1200, 800, 1000]]
y_predict = lm.predict(x_test)

output

array([[45714.69582687]])

Btw, fix from sklearn.linear_model import LinearRegression

Upvotes: 1

Celius Stingher
Celius Stingher

Reputation: 18377

I just tried your code and got this when priting Z: 45714.69582687167 the only thing I changed is the import: from sklearn.linear_model import LinearRegression() to from sklearn.linear_model import LinearRegression

Upvotes: 1

Related Questions