Aditya
Aditya

Reputation: 425

parameters of make_classification function in sklearn

I am trying to generate synthetic data using make_classification function:

import sklearn.datasets as sk
X, y = sk.make_classification(n_samples=10, 
                              n_features=3, n_informative=2, n_redundant=0, n_classes=2, 
                              n_clusters_per_class=1, weights=None, random_state=1)
print(X)

What to do the parameters of make_classification mean?

Upvotes: 0

Views: 3193

Answers (1)

TC Arlen
TC Arlen

Reputation: 1482

A more specific question would be good, but here is some help.

  • n_samples - total number of training rows, examples that match the parameters. That's why in the shape of the returned design matrix, X, it is (n_samples, n_features)
  • n_features - number of columns/features of dataset.
  • n_informative - number of features that will be useful in helping to classify your test dataset. In other words, if you perform PCA or another dimensionality reduction algorithm, you should be able to explain nearly 100 % of the variance in the problem with just n_informative (in your case 2) features.
  • weights - basically this is a ratio of class balance. For your case, since it's set to None and number of samples = 10, you'll get 5 classes of each label.

Upvotes: 1

Related Questions