otaku
otaku

Reputation: 1029

Linear Regression overfitting

I'm pursuing course 2 on this coursera course on linear regression (https://www.coursera.org/specializations/machine-learning)

I've solved the training using graphlab but wanted to try out sklearn for the experience and learning. I'm using sklearn and pandas for this.

The model overfits on the data. How can I fix this? This is the code.

These are the coefficients i'm getting.

[ -3.33628603e-13 1.00000000e+00]

poly1_data = polynomial_dataframe(sales["sqft_living"], 1)
poly1_data["price"] = sales["price"]
model1 = LinearRegression()
model1.fit(poly1_data, sales["price"])
print(model1.coef_)
plt.plot(poly1_data['power_1'], poly1_data['price'], '.',poly1_data['power_1'], model1.predict(poly1_data),'-')
plt.show()

The plotted line is like this. As you see it connects every data point. enter image description here and this is the plot of the input data enter image description here

Upvotes: 1

Views: 3306

Answers (1)

ilanman
ilanman

Reputation: 838

I wouldn't even call this overfit. I'd say you aren't doing what you think you should be doing. In particular, you forgot to add a column of 1's to your design matrix, X. For example:

# generate some univariate data
x = np.arange(100)
y = 2*x + x*np.random.normal(0,1,100)
df = pd.DataFrame([x,y]).T
df.columns = ['x','y']

You're doing the following:

model1 = LinearRegression()
X = df["x"].values.reshape(1,-1)[0]  # reshaping data
y = df["y"].values.reshape(1,-1)[0]
model1.fit(X,y)

Which leads to:

plt.plot(df['x'].values, df['y'].values,'.')
plt.plot(X[0], model1.predict(X)[0],'-')
plt.show()

enter image description here

Instead, you want to add a column of 1's to your design matrix (X):

X = np.column_stack([np.ones(len(df['x'])),df["x"].values.reshape(1,-1)[0]])
y = df["y"].values.reshape(1,-1)
model1.fit(X,y)

And (after some reshaping) you get:

plt.plot(df['x'].values, df['y'].values,'.')
plt.plot(df['x'].values, model1.predict(X),'-')
plt.show()

enter image description here

Upvotes: 3

Related Questions