Predict a single value with scikit-learn leads to ValueError

Question

I try to do some basic sklearn stuff, with a single X Variable and a single Y Variable. Single I predict with a single column, I have to transform X into a 2D Array. Now I want to predict a single value, but my model only allows me to predict an array of length of length 32.

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import numpy as np

df = pd.read_csv("https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv")
df

X = df["mpg"].values.reshape(1, -1)
y = df["cyl"].values.reshape(1, -1)

y
clf = RandomForestClassifier(random_state=0)
clf.fit(X, y)

clf.predict([[35]])

ValueError: Number of features of the model must match the input. Model n_features is 32 and input n_features is 1

Can anyone help me to solve this problem?

StupidWolf · Accepted Answer

You fitted the model wrongly with data of the wrong shape, if you do:

X = df["mpg"].values.reshape(1, -1)
y = df["cyl"].values.reshape(1, -1)

X.shape
(1, 32)

This means X is 1 observation and 32 predictors.. whereas what you have is 1 predictor and 32 observations.

So it should be:

X = df[["mpg"]]
y = df["cyl"]

clf = RandomForestClassifier(random_state=0)
clf.fit(X, y)

Then predict using:

clf.predict(np.array(35).reshape(-1,1))
array([4])

Predict a single value with scikit-learn leads to ValueError

Answers (1)

Related Questions