jean
jean

Reputation: 141

Will correlation impact feature importance of ML models?

I am building a xgboost model with hundreds of features. For features that highly correlated(pearson correlation) with each other, I am thinking to use feature importance(measuring by Gain) to drop the one with low importance. My question: 1: Will correlation impact/biase feature importance (measuring by Gain)? 2: Is there any good way to remove highly correlated feature for ML models?

example: a's importance=120, b's importance=14, corr(a,b)=0.8. I am thinking to drop b because its importance=14. But is it correct?

Thank you.

Upvotes: 0

Views: 1761

Answers (1)

hafiz031
hafiz031

Reputation: 2670

Correlation definitely impacts feature importance. Meaning that if the features are highly correlated, there would be a high level of redundancy if you keep them all. Because two features are correlated means change in one will change the another. So there is no need to keep all of them right? As they are surely representative of one another and using a few of them you can hopefully classify your data well.

So in order to remove highly correlated features you can:

  1. Use PCA to reduce dimensionality, or,
  2. Use decision tree to find the important features, or,
  3. You may manually choose features from your knowledge (if it is possible) which features are more promising to help you to classify your data, or,
  4. You can combine some features to a new feature manually such that saying one feature may eliminate the necessity to tell another set of features as those are likely can be inferred from that single feature.

Upvotes: 1

Related Questions