Skinishh
Skinishh

Reputation: 59

How are categorical features encoded in lightGBM?

LightGBM has support for categorical variables. I would like to know how it encodes them. It doesn't seem to be one hot encode since the algorithm is pretty fast (I tried with data that took a lot of time to one hot encode).

Upvotes: 3

Views: 10921

Answers (1)

BugKiller
BugKiller

Reputation: 1488

https://github.com/Microsoft/LightGBM/issues/699#issue-243313657

The basic idea is sorting the histogram according to it's accumulate values (sum_gradient / sum_hessian), then find the best split on the sorted histogram, just like the numerical features.

Upvotes: 1

Related Questions