Reputation: 59
LightGBM has support for categorical variables. I would like to know how it encodes them. It doesn't seem to be one hot encode since the algorithm is pretty fast (I tried with data that took a lot of time to one hot encode).
Upvotes: 3
Views: 10921
Reputation: 1488
https://github.com/Microsoft/LightGBM/issues/699#issue-243313657
The basic idea is sorting the histogram according to it's accumulate values (sum_gradient / sum_hessian), then find the best split on the sorted histogram, just like the numerical features.
Upvotes: 1