juliettegudknecht
juliettegudknecht

Reputation: 9

Using a Categorical Variable Predictor in XGBoost Algorithm

I am trying to use a categorical predictor in an xgboost algorithm, but keep getting errors. Here are the relevant parts of my code.

df = data[["country_name", "Timestamp", "Flow Duration", "Flow IAT Min", "Src Port", "Tot Fwd Pkts", "Init Bwd Win Byts", "Label"]]
from pandas.api.types import CategoricalDtype
df["country_name"] = df["country_name"].astype(CategoricalDtype(ordered=True))

X = df[["country_name", "Flow Duration", "Flow IAT Min", "Src Port", "Tot Fwd Pkts", "Init Bwd Win Byts"]]
df["Label"] = df["Label"].replace(['benign','ddos'],[0,1])
y = df["Label"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

model2 = xgb.XGBClassifier(tree_method="gpu_hist", enable_categorical=True, use_label_encoder = False)

model2.fit(X_train,y_train)

I also tried using .astype("category") too and it didn't work. I keep getting this error when I run the last bit of code:

ValueError: DataFrame.dtypes for data must be int, float, bool or categorical.  When
                categorical type is supplied, DMatrix parameter
                `enable_categorical` must be set to `True`.country_name

Any help would be appreciated, thank you!!

Upvotes: 0

Views: 1217

Answers (2)

Jon
Jon

Reputation: 1

You can explicitly make you DMatrix and that is where you need to enable categorical

e.g.

train_x, valid_x, train_y, valid_y = train_test_split(x_subfeatures, y_encoded, train_size=.75)

dtrain = xgb.DMatrix(
    train_x, 
    label=train_y,
    #enable categorical data
    enable_categorical=True
)

dvalid = xgb.DMatrix(
    valid_x,
    label=valid_y,
    enable_categorical=True
)

Upvotes: 0

powerdaten
powerdaten

Reputation: 1

Ideally you check / attach the .dtypes of all your relevant predictors.

In this specific case, country_name might be of object type, i.e. you would need to encode this variable first.

To encode, you can choose among the following: https://contrib.scikit-learn.org/category_encoders/

Upvotes: -1

Related Questions