dataset for plot

I need to create my own data to develop a classifier and I don't know how.

Upvotes: -1

Views: 139

Answers (2)

Yahya
Yahya

Reputation: 14072

You can do that very efficiently by using sklearn.datasets.make_classification.

It generates a random n-class classification problem, with a lot of options and high flexibility.

Example:

X, y = make_classification(n_samples=100, n_features=2, n_informative=2, 
                           n_redundant=0, n_repeated=0, n_clusters_per_class=1, 
                           n_classes=2, shuffle=True, random_state=2021)

The above one liner, creates a data set with 100 samples, 2 features (all of them are informative), 2 classes and 1 cluster per class, then it shuffles them. The random_state is just to make the process reproducible.

Then you can plot it as:

plt.scatter(X[:, 0], X[:, 1], marker='o', c=y, s=25, edgecolor='k')
plt.show()

Sample of how the output would look like:

enter image description here

Upvotes: 1

Ananda
Ananda

Reputation: 3272

You can just create normals with specific mean and std.

import numpy as np
import matplotlib.pyplot as plt

std = [[0.5, 0], [0, 0.5]]
X1 = np.random.multivariate_normal([2, -2], std, size=100)
X2 = np.random.multivariate_normal([-2, 2], std, size=100)
X = np.vstack((X1, X2))

Y1 = np.random.multivariate_normal([2, 2], std, size=100)
Y2 = np.random.multivariate_normal([-2, -2], std, size=100)
Y = np.vstack((Y1, Y2))

plt.scatter(X[:, 0], X[:, 1])
plt.scatter(Y[:, 0], Y[:, 1])
plt.show() 

Upvotes: 0

Related Questions