user84592
user84592

Reputation: 4882

Relation between coefficients in linear regression and feature importance in decision trees

Recently I have a Machine Learning(ML) project, which needs to identify the features(inputs, a1,a2,a3 ... an) that have large impacts on target/outputs.

I used linear regression to get the coefficients of the feature, and decision trees algorithm (for example Random Forest Regressor) to get important features (or feature importance).

Is my understanding right that the feature with large coefficient in linear regression shall be among the top list of importance of features in Decision tree algorithm?

Upvotes: 1

Views: 1250

Answers (2)

Arad Haselirad
Arad Haselirad

Reputation: 324

Short answer to your question is No, not necessarily. Considering the fact that we do not know what are your different inputs, if they are in the same unit system, range of variation and etc. I am not sure why you have combined Linear regression with Decision tree. But I just assume you have a working model, say a linear regression which provides good accuracy on the test set. From what you have asked, you probably need to look at sensitivity analysis based on the obtained model. I would suggest doing some reading on "SALib" library and generally the subject of sensitivity analysis.

Upvotes: 2

Ahmed Ragab
Ahmed Ragab

Reputation: 846

Not really, if your input features are not normalized, you could have a relatively big co-efficient for features with a relatively big mean/std. If your features are normalized, then yes, this could be an indicator to the features importance, but there are still other things to consider.

You could try some of sklearn's feature selection classes that should do this automatically for you here.

Upvotes: 2

Related Questions