pingboing
pingboing

Reputation: 69

xgboost feature importance of categorical variable

I am using XGBClassifier to train in python and there are a handful of categorical variables in my training dataset. Originally, I planed to convert each of them into a few dummies before I throw in my data, but then the feature importance will be calculated for each dummy, not the original categorical ones. Since I also need to order all of my original variables (including numerical + categorical) by importance, I am wondering how to get importance of my original variables? Is it simply adding up?

Upvotes: 1

Views: 1487

Answers (1)

boot-scootin
boot-scootin

Reputation: 12515

You could probably get by with summing the individual categories' importances into their original, parent category. But, unless these features are high-cardinality, my two cents would be to report them individually. I tend to err on the side of being more explicit with reporting model performance/importance measures.

Upvotes: 0

Related Questions