Reputation: 11
I have a question about make_classification from scikit-learn. I have created a dataset with make_classification (binary classification task) and the aim is to test how well different models can distinguish important features from less important features.
How can I set an experiment in which I can evaluate whether a model is able to identify the variables that have an influence?
I have looked at the documentation of make_classification, but unfortunately I did not get any further.
I have set the following:
X,y = make_classification(n_samples=50000, n_features=10, n_informative=5,
n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2,
class_sep=1,
flip_y=0.01, weights=[0.9,0.1], shuffle=True, random_state=42)
How can I display the - in this case - 5 informative varibles? Can I shape the importance of the features when generating data with make_classification? Which features are intended to be important by make_classification? And then in a next step, I would use some freature_importance methods to verify (or not) how well a model detects the "pre-setted" feature importance/ the variables with influence.
Thank you, any ideas or advice are highly appreciated.
Upvotes: 0
Views: 37
Reputation: 84
I'm not really sure if it's what you mean, but a lot (or maybe all?) of the classifiers in sklearn have the feature_importance
method (see e.g. Random Forest Classifier. This is how much "weight" or "importance" the model gave to each feature. This is true also for regression models
Upvotes: 0