Reputation: 197
i am implementing simple polynomial regression to predict time for a video given its size, and it's my own dataset. Now for some reason, i am getting multiple traces for my plot.
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('estSize.csv')
X = dataset.iloc[:, 0].values.reshape(-1,1)
y = dataset.iloc[:, 1].values.reshape(-1,1)
from sklearn.linear_model import LinearRegression
# Fitting Polynomial Regression to the dataset
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 2)
X_poly = poly_reg.fit_transform(X)
poly_reg.fit(X_poly, y)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y)
# Visualising the Polynomial Regression results
plt.scatter(X, y, color = 'red')
plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue')
plt.show()
Upvotes: 4
Views: 2262
Reputation: 11336
Your data needs to be ordered with respect to the predictor.
After the line
dataset = pd.read_csv('estSize.csv')
Add this line:
dataset = dataset.sort_values(by=['col1'])
Where col1
is your column header for the file-size values.
Upvotes: 5