Chuong Nguyen
Chuong Nguyen

Reputation: 67

Plot polynomial regression with Scikit-Learn

I am writing a python code for investigating the over-fiting using the function sin(2.pi.x) in range of [0,1]. I first generate N data points by adding some random noise using Gaussian distribution with mu=0 and sigma=1. I fit the model using M-th polynomial. Here is my code

import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# generate N random points
N=30
X= np.random.rand(N,1)
y= np.sin(np.pi*2*X)+ np.random.randn(N,1)

M=2
poly_features=PolynomialFeatures(degree=M, include_bias=False)
X_poly=poly_features.fit_transform(X) # contain original X and its new features
model=LinearRegression()
model.fit(X_poly,y) # Fit the model

# Plot
X_plot=np.linspace(0,1,100).reshape(-1,1)
X_plot_poly=poly_features.fit_transform(X_plot)
plt.plot(X,y,"b.")
plt.plot(X_plot_poly,model.predict(X_plot_poly),'-r')
plt.show()

Picture of polynomial regression

I don't know why I have M=2 lines of m-th polynomial line? I think it should be 1 line regardless of M. Could you help me figure out this problem.

Upvotes: 4

Views: 12907

Answers (1)

pauli
pauli

Reputation: 4291

Your data after polynomial feature transformation is of shape (n_samples,2). So pyplot is plotting the predicted variable with both columns.

Change the plot code to

plt.plot(X_plot_poly[:,i],model.predict(X_plot_poly),'-r')
where i your column number

Upvotes: 4

Related Questions