Thomas
Thomas

Reputation: 109

Predict future values after using polynomial regression in python

I'm currently using TensorFlow and SkLearn to to try to make a model that can predict the amount of sales for a certain product, X, based on the outdoor temperature in celcius.

I took my datasets for the temperature and set it equal to the x variable, and the amount of sales to as a y variable. As seen on the picture below, there is some sort of correlation between the temperature and the amount of sales:

Graph made using matplotlib.pyplot

First and foremost, I tried to do linear regression to see how well it'd fit. This is the code for that:

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(x_train, y_train) #fit tries to fit the x variable and y variable.

#Let's try to plot it out.
y_pred = model.predict(x_train)

plt.scatter(x_train,y_train)
plt.plot(x_train,y_pred,'r')
plt.legend(['Predicted Line', 'Observed data'])
plt.show()

This resulted in a predicted line that had a pretty poor fit:

enter image description here

A very nice feature from sklearn however is that you can try to predict an value based on a temperature, so if I were to write

model.predict(15)

i'd get the output

array([6949.05567873])

This is exactly what I want, I just wanted to line to fit better so instead I tried polynoimal regression with sklearn by doing following:

from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=8, include_bias=False) #the bias is avoiding the need to intercept
x_new = poly.fit_transform(x_train)
new_model = LinearRegression()
new_model.fit(x_new,y_train)

#plotting
y_prediction = new_model.predict(x_new) #this actually predicts x...?
plt.scatter(x_train,y_train)
plt.plot(x_new[:,0], y_prediction, 'r')
plt.legend(['Predicted line', 'Observed data'])
plt.show()

The line seems to fit better now: enter image description here

My problem is not that I can't use new_model.predict(x) since it'll result in "ValueError: shapes (1,1) and (8,) not aligned: 1 (dim 1) != 8 (dim 0)". I understand that this is because I'm using a 8-degree polynomium, but is there any way for me to predict the y-axsis based on ONE temperature using the polynomial regression model?

Upvotes: 3

Views: 7541

Answers (1)

Manny
Manny

Reputation: 141

Try using new_model.predict([x**a for a in range(1,9)]) or according to your previously used code, you can do new_model.predict(poly.fit_transform(x))

Since you fit a line

y = ax^1 + bx^2 + ... + h*x^8

you, need to transform your input in the same manner i.e. turn it into a polynomial without the intercept and slope terms. This was what you passed into Linear Regression training function. It learns the slope terms for that polynomial. The plot you've shown only contains the x^1 term you indexed into (x_new[:,0]) which means that the data you're using has more columns.

One last note: always make sure your training data and future/validation data undergo the same preprocessing steps to ensure your model works.

Here's some detail :

Let's start by running your code, on synthetic data.

from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from numpy.random import rand

x_train = rand(1000,1)
y_train = rand(1000,1)

poly = PolynomialFeatures(degree=8, include_bias=False) #the bias is avoiding the need to intercept
x_new = poly.fit_transform(x_train)
new_model = LinearRegression()
new_model.fit(x_new,y_train)

#plotting
y_prediction = new_model.predict(x_new) #this predicts y
plt.scatter(x_train,y_train)
plt.plot(x_new[:,0], y_prediction, 'r')
plt.legend(['Predicted line', 'Observed data'])
plt.show()

Plot showing data and line fit

Now we can predict y value by transforming an x-value into a polynomial of degree 8 without an intercept

print(new_model.predict(poly.fit_transform(0.25)))

[[0.47974408]]

Upvotes: 3

Related Questions