mineral
mineral

Reputation: 529

I got this error 'DataFrame.dtypes for data must be int, float, bool or categorical'

I'm going to train this as an xgboost model.

enter image description here

'start_time','end_time' column was in yyyy-mm-dd hh:mm:ss format.

I changed it to string using astype(str) and changed it to yyyymmddhhmmss format using regular expressions.

xgb_model = xgboost.XGBClassifier(eta=0.1, nrounds=1000, max_depth=8, colsample_bytree=0.5, scale_pos_weight=1.1, booster='gbtree', 
                                  metric='multi:softmax')
hr_pred = xgb_model.fit(x_train, np.ravel(y_train, order='C')).predict(x_test)
print(classification_report(y_test, hr_pred))

But this kind of error occurred and I've never seen like this before.

ValueError: DataFrame.dtypes for data must be int, float, bool or categorical.  When
            categorical type is supplied, DMatrix parameter
            `enable_categorical` must be set to `True`.start_time, end_time

how can I solve this problem?

Thanks for your help.

Upvotes: 13

Views: 54143

Answers (1)

Carlos Mougan
Carlos Mougan

Reputation: 811

It seems that you have categorial data. Start_time and end_time are object type.

You need either to drop them or to encode them.

To drop them

xgb_model = xgboost.XGBClassifier(eta=0.1, nrounds=1000, max_depth=8, colsample_bytree=0.5, scale_pos_weight=1.1, booster='gbtree', 
                                  metric='multi:softmax')
hr_pred = xgb_model.fit(x_train._get_numeric_data(), np.ravel(y_train, order='C')).predict(x_test._get_numeric_data())
print(classification_report(y_test, hr_pred))

To encode them have a look at this library https://contrib.scikit-learn.org/category_encoders/

Upvotes: 16

Related Questions