Reputation: 2766
I'm fitting an xgboost
model to some data which is stored in a dataframe. After fitting, I would like to run the .predict method of the classifier/regressor on a single row from the dataframe.
Following, is a minimal example, which predicts fine on the full dataframe, yet crashes when running on only the second row of the dataframe.
from sklearn.datasets import load_iris
import xgboost
# Load iris data such that X is a dataframe
X, y = load_iris(return_X_y=True, as_frame=True)
clf = xgboost.XGBClassifier()
clf.fit(X, y)
# Predict for all rows - works fine
y_pred = clf.predict(X)
# Predict for single row. Crashes.
# Error: '('Expecting 2 dimensional numpy.ndarray, got: ', (4,))'
secondrow = X.iloc[1]
secondpred = clf.predict(secondrow)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-45-a06c6820c458> in <module>
11 # Error: '('Expecting 2 dimensional numpy.ndarray, got: ', (4,))'
12 secondrow = X.iloc[1]
---> 13 secondpred = clf.predict(secondrow)
e:\Anaconda3\envs\py37\lib\site-packages\xgboost\sklearn.py in predict(self, data, output_margin, ntree_limit, validate_features)
789 output_margin=output_margin,
790 ntree_limit=ntree_limit,
--> 791 validate_features=validate_features)
792 if output_margin:
793 # If output_margin is active, simply return the scores
e:\Anaconda3\envs\py37\lib\site-packages\xgboost\core.py in predict(self, data, output_margin, ntree_limit, pred_leaf, pred_contribs, approx_contribs, pred_interactions, validate_features)
1282
1283 if validate_features:
-> 1284 self._validate_features(data)
1285
1286 length = c_bst_ulong()
e:\Anaconda3\envs\py37\lib\site-packages\xgboost\core.py in _validate_features(self, data)
1688
1689 raise ValueError(msg.format(self.feature_names,
-> 1690 data.feature_names))
1691
1692 def get_split_value_histogram(self, feature, fmap='', bins=None, as_pandas=True):
ValueError: feature_names mismatch: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] ['f0', 'f1', 'f2', 'f3']
expected petal length (cm), petal width (cm), sepal length (cm), sepal width (cm) in input data
training data did not have the following fields: f1, f3, f0, f2
Upvotes: 3
Views: 8750
Reputation: 62413
predict
expects an array of a specific shape, based upon the model fit
.secondrow
is a one dimensional pandas.Series
, which does not match the shape of the model.X.iloc[1]
sepal length (cm) 4.9
sepal width (cm) 3.0
petal length (cm) 1.4
petal width (cm) 0.2
Name: 1, dtype: float64
# look at the array
X.iloc[1].values
array([4.9, 3. , 1.4, 0.2]) # note this is a 1-d array
# look at the shape
secondrow.values.shape
(4,)
.predict
.secondrow = pd.DataFrame(X.iloc[1]).T
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
1 4.9 3.0 1.4 0.2
# look at secondrow as an array
secondrow.values
array([[4.9, 3. , 1.4, 0.2]]) # note this is a 2-d array
# look at the shape
secondrow.values.shape
(1, 4)
# predict
secondpred = clf.predict(secondrow)
# result
array([0])
Upvotes: 6