How do lightgbm encode categorial features?

I have the following structure of one lightGbm tree: {'split_index': 0, 'split_feature': 41, 'split_gain': 97.25859832763672, 'threshold': '3||4||8', 'decision_type': '==', 'default_left': False, 'missing_type': 'None', 'internal_value': 0, 'internal_weight': 0, 'internal_count': 73194, 'left_child': {'split_index': 1, and the feature in 0 node is categorial and I feed this feature in format "category". where can I find the appropriate between number format and category?

Upvotes: 0

Views: 348

Answers (1)

user19410760
user19410760

Reputation: 26

The numbers you see are the values of the codes attribute of your categorical features. For example:

import pandas as pd

s = pd.Series(['a', 'b', 'a', 'a', 'b'], dtype='category')
print(s.cat.codes)
# 0    0
# 1    1
# 2    0
# 3    0
# 4    1
# dtype: int8

so in this case 0 is a and 1 is b. You can build a mapping from the category code to the value with something like the following:

dict(enumerate(s.cat.categories))
# {0: 'a', 1: 'b'}

If the categories in your column don't match the ones in the model, LightGBM will update them.

Upvotes: 1

Related Questions