Reputation: 2085
I try to do some basic sklearn stuff, with a single X Variable and a single Y Variable. Single I predict with a single column, I have to transform X into a 2D Array. Now I want to predict a single value, but my model only allows me to predict an array of length of length 32.
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import numpy as np
df = pd.read_csv("https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv")
df
X = df["mpg"].values.reshape(1, -1)
y = df["cyl"].values.reshape(1, -1)
y
clf = RandomForestClassifier(random_state=0)
clf.fit(X, y)
clf.predict([[35]])
ValueError: Number of features of the model must match the input. Model n_features is 32 and input n_features is 1
Can anyone help me to solve this problem?
Upvotes: 1
Views: 870
Reputation: 46908
You fitted the model wrongly with data of the wrong shape, if you do:
X = df["mpg"].values.reshape(1, -1)
y = df["cyl"].values.reshape(1, -1)
X.shape
(1, 32)
This means X is 1 observation and 32 predictors.. whereas what you have is 1 predictor and 32 observations.
So it should be:
X = df[["mpg"]]
y = df["cyl"]
clf = RandomForestClassifier(random_state=0)
clf.fit(X, y)
Then predict using:
clf.predict(np.array(35).reshape(-1,1))
array([4])
Upvotes: 2