Reputation: 385
I have trained a model using several algorithms, including Random Forest from skicit-learn and LightGBM. and these model performs similarly in term of accuracy and other stats.
The issue is the inconsistent behavior between these two algorithms in terms of feature importance. I used default parameters and I know that they are using different method for calculating the feature importance but I suppose the highly correlated features should always have the most influence to the model's prediction. Random Forest makes more sense to me because the highly correlated features appear at top while it is not the case for LightGBM.
Is there a way to explain for this behavior and does this result with LightGBM is trustworthy to be presented?
Random Forest feature importance
LightGBM feature importance
Correlation with target
Upvotes: 4
Views: 6981
Reputation: 46
I have had a similar issue. The default feature importance for LGBM is based on 'split', and when I changed this to 'gain', the plots gave similar results.
Upvotes: 3
Reputation: 20302
Well, GBM is often shown to perform better especially when you comparing with random forest. Especially when comparing it with LightGBM. A properly-tuned LightGBM will most likely win in terms of performance and speed compared with random forest.
GBM advantages :
More developed. A lot of new features are developed for modern GBM model (xgboost, lightgbm, catboost) which affect its performance, speed, and scalability.
GBM disadvantages :
Number of parameters to tune
Tendency to overfit easily
If you aren't completely sure the hyperparameters are tuned correctly for the LightGBM, stick with the Random Forest; this will be easier to use and maintain.
Upvotes: 2