Reputation: 390
My code is returning an error when I use PolynomialFeatures:
poly1 = PolynomialFeatures(degree=1)
poly3 = PolynomialFeatures(degree=3)
poly6 = PolynomialFeatures(degree=6)
poly9 = PolynomialFeatures(degree=9)
X_train = X_train.reshape(-1,1)
y_train = y_train.reshape(-1,1)
predictions = []
predict = np.linspace(0,10,100)
x_poly1 = poly1.fit_transform(X_train).reshape(-1,1)
X_train1, X_test1, y_train1, y_test1 = train_test_split(x_poly1, y_train)
linreg1 = LinearRegression().fit(X_train1, y_train1)
x_poly3 = poly3.fit_transform(X_train).reshape(-1,1)
X_train3, X_test3, y_train3, y_test3 = train_test_split(x_poly3, y_train)
linreg3 = LinearRegression().fit(X_train3, y_train3)
x_poly6 = poly6.fit_transform(X_train).reshape(-1,1)
X_train6, X_test6, y_train6, y_test6 = train_test_split(x_poly6, y_train)
linreg6 = LinearRegression().fit(X_train6, y_train6)
x_poly9 = poly9.fit_transform(X_train).reshape(-1,1)
X_train9, X_test9, y_train9, y_test9 = train_test_split(x_poly9, y_train)
linreg9 = LinearRegression().fit(X_train9, y_train9)
predict1 = poly1.fit_transform(predict).reshape(-1,1)
predict3 = poly3.fit_transform(predict).reshape(-1,1)
predict6 = poly6.fit_transform(predict).reshape(-1,1)
predict9 = poly9.fit_transform(predict).reshape(-1,1)
ans1 = linreg1.predict(predict1)
ans3 = linreg3.predict(predict3)
ans6 = linreg6.predict(predict6)
ans9 = linreg9.predict(predict9)
np.concatenate(ans1, ans3, ans6, ans9)
or alternatively
for i in enumerate([1,3,6,9]):
poly = PolynomialFeatures(degree=i)
x_poly = poly.fit_transform(X_train).reshape(-1,1)
X_train, X_test, y_train, y_test = train_test_split(x_poly, y_train)
linreg = LinearRegression().fit(X_train1, y_train1)
ans = linreg.predict(poly.fit_transform(predict).reshape(-1,1))
np.concatenate(ans1, ans3, ans6, ans9)
In my code, I am trying to append all the values to a list for later use, but I get an error:
ValueError Traceback (most recent call last)
<ipython-input-3-bca8e3056e3a> in <module>()
18
19 x_poly1 = poly1.fit_transform(X_train).reshape(-1,1)
---> 20 X_train1, X_test1, y_train1, y_test1 = train_test_split(x_poly1, y_train)
21 linreg1 = LinearRegression().fit(X_train1, y_train1)
22
/opt/conda/lib/python3.6/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options)
1687 test_size = 0.25
1688
-> 1689 arrays = indexable(*arrays)
1690
1691 if stratify is not None:
/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py in indexable(*iterables)
204 else:
205 result.append(np.array(X))
--> 206 check_consistent_length(*result)
207 return result
208
/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
179 if len(uniques) > 1:
180 raise ValueError("Found input variables with inconsistent numbers of"
--> 181 " samples: %r" % [int(l) for l in lengths])
182
183
ValueError: Found input variables with inconsistent numbers of samples: [22, 11]
What is the reason I am getting this error? I want the end result to be a array with shape (4, 100). Please ask if clarification is needed.
Upvotes: 1
Views: 1833
Reputation: 2871
There are a few things that look incorrect but it's hard to tell without a simple example which we can recreate, and the full error message indicating which line is producing it. I can see through that
poly1.fit_transform
4 times, suspect this is a copy paste errorpredictions.append(ans1, ans3, ans6, ans9)
should probably be np.concatenate((ans1, ans3, ans6, ans9), axis=1)
But you can see from the stack trace that the error is from line 20, the call to train_test_split
. What it's saying are the lengths of x and y aren't consistent. The reason is the reshape
you've added to poly1.fit_transform(X_train)
. In this case, it's taking the output from the transforms which has shape (n,2) - a matrix - and reshaping into (2*n,) - a vector - which is twice as long as the original X_train.
I would recommend learning how to create a Pipeline
to combine the PolynomialFeatures
and LinearRegression
into a single object which you can fit and predict.
i.e. Take a look at Polynomial interpolation
model = make_pipeline(PolynomialFeatures(degree), LinearRegression())
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
This code should work
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import numpy as np
X_train = np.ones(200)
y_train = np.ones(200)
poly1 = PolynomialFeatures(degree=1)
poly3 = PolynomialFeatures(degree=3)
poly6 = PolynomialFeatures(degree=6)
poly9 = PolynomialFeatures(degree=9)
X_train = X_train.reshape(-1,1)
y_train = y_train.reshape(-1,1)
predictions = []
predict = np.linspace(0,10,100)
x_poly1 = poly1.fit_transform(X_train) # removed reshape
X_train1, X_test1, y_train1, y_test1 = train_test_split(x_poly1, y_train)
linreg1 = LinearRegression().fit(X_train1, y_train1)
x_poly3 = poly3.fit_transform(X_train) # removed reshape
X_train3, X_test3, y_train3, y_test3 = train_test_split(x_poly3, y_train)
linreg3 = LinearRegression().fit(X_train3, y_train3)
x_poly6 = poly6.fit_transform(X_train) # removed reshape
X_train6, X_test6, y_train6, y_test6 = train_test_split(x_poly6, y_train)
linreg6 = LinearRegression().fit(X_train6, y_train6)
x_poly9 = poly9.fit_transform(X_train) # removed reshape
X_train9, X_test9, y_train9, y_test9 = train_test_split(x_poly9, y_train)
linreg9 = LinearRegression().fit(X_train9, y_train9) # fixed incorrect X,y
predict1 = poly1.fit_transform(predict.reshape(-1,1))
predict3 = poly3.fit_transform(predict.reshape(-1,1)) # changed poly1 to poly3
predict6 = poly6.fit_transform(predict.reshape(-1,1)) # changed poly1 to poly6
predict9 = poly9.fit_transform(predict.reshape(-1,1)) # changed poly1 to poly9
ans1 = linreg1.predict(predict1)
ans3 = linreg3.predict(predict3)
ans6 = linreg6.predict(predict6)
ans9 = linreg9.predict(predict9)
np.concatenate((ans1, ans3, ans6, ans9), axis=1) # use concatenate
And using pipelines it becomes
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
import numpy as np
X = np.ones((20,1))
y = np.ones((20,1))
X_train, X_test, y_train, y_test = train_test_split(X, y)
def create_model(degree):
model = make_pipeline(PolynomialFeatures(degree), LinearRegression())
return model
predictions = []
for degree in [1,3,6,9]:
model = create_model(degree)
model.fit(X_train,y_train)
predicted = model.predict(X_test)
predictions.append(predicted)
predictions = np.concatenate(predictions, axis=1)
print(predictions.shape)
Upvotes: 4