CuriousToKnow
CuriousToKnow

Reputation: 11

Is there a way to evaluate whether a model is able to identify the variables that have an influence (variables generated with make_classification)?

I have a question about make_classification from scikit-learn. I have created a dataset with make_classification (binary classification task) and the aim is to test how well different models can distinguish important features from less important features.

How can I set an experiment in which I can evaluate whether a model is able to identify the variables that have an influence?

I have looked at the documentation of make_classification, but unfortunately I did not get any further.

I have set the following:

X,y = make_classification(n_samples=50000, n_features=10, n_informative=5, 
                    n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2,
                          class_sep=1,
                   flip_y=0.01, weights=[0.9,0.1], shuffle=True, random_state=42)

How can I display the - in this case - 5 informative varibles? Can I shape the importance of the features when generating data with make_classification? Which features are intended to be important by make_classification? And then in a next step, I would use some freature_importance methods to verify (or not) how well a model detects the "pre-setted" feature importance/ the variables with influence.

Thank you, any ideas or advice are highly appreciated.

Upvotes: 0

Views: 37

Answers (1)

Federico
Federico

Reputation: 84

I'm not really sure if it's what you mean, but a lot (or maybe all?) of the classifiers in sklearn have the feature_importance method (see e.g. Random Forest Classifier. This is how much "weight" or "importance" the model gave to each feature. This is true also for regression models

Upvotes: 0

Related Questions