Reputation: 4882
Recently I have a Machine Learning(ML) project, which needs to identify the features(inputs, a1,a2,a3 ... an) that have large impacts on target/outputs.
I used linear regression to get the coefficients of the feature, and decision trees algorithm (for example Random Forest Regressor) to get important features (or feature importance).
Is my understanding right that the feature with large coefficient in linear regression shall be among the top list of importance of features in Decision tree algorithm?
Upvotes: 1
Views: 1250
Reputation: 324
Short answer to your question is No, not necessarily. Considering the fact that we do not know what are your different inputs, if they are in the same unit system, range of variation and etc. I am not sure why you have combined Linear regression with Decision tree. But I just assume you have a working model, say a linear regression which provides good accuracy on the test set. From what you have asked, you probably need to look at sensitivity analysis based on the obtained model. I would suggest doing some reading on "SALib" library and generally the subject of sensitivity analysis.
Upvotes: 2
Reputation: 846
Not really, if your input features are not normalized, you could have a relatively big co-efficient for features with a relatively big mean/std. If your features are normalized, then yes, this could be an indicator to the features importance, but there are still other things to consider.
You could try some of sklearn's feature selection classes that should do this automatically for you here.
Upvotes: 2