Reputation: 3
I have the following structure of one lightGbm tree: {'split_index': 0, 'split_feature': 41, 'split_gain': 97.25859832763672, 'threshold': '3||4||8', 'decision_type': '==', 'default_left': False, 'missing_type': 'None', 'internal_value': 0, 'internal_weight': 0, 'internal_count': 73194, 'left_child': {'split_index': 1, and the feature in 0 node is categorial and I feed this feature in format "category". where can I find the appropriate between number format and category?
Upvotes: 0
Views: 348
Reputation: 26
The numbers you see are the values of the codes
attribute of your categorical features. For example:
import pandas as pd
s = pd.Series(['a', 'b', 'a', 'a', 'b'], dtype='category')
print(s.cat.codes)
# 0 0
# 1 1
# 2 0
# 3 0
# 4 1
# dtype: int8
so in this case 0
is a
and 1
is b
.
You can build a mapping from the category code to the value with something like the following:
dict(enumerate(s.cat.categories))
# {0: 'a', 1: 'b'}
If the categories in your column don't match the ones in the model, LightGBM will update them.
Upvotes: 1