Reputation: 321
I'm working on a Python function, where I want to model a Gaussian distribution, I'm stuck though.
import numpy.random as rnd
import numpy as np
def genData(co1, co2, M):
X = rnd.randn(2, 2M + 1)
t = rnd.randn(1, 2M + 1)
numpy.concatenate(X, co1)
numpy.concatenate(X, co2)
return(X, t)
I'm trying for two clusters of size M, cluster 1 is centered at co1, cluster 2 is centered at co2. X would return the data points I'm going to graph, and t are the target values (1 if cluster 1, 2 if cluster 2) so I can color it by cluster.
In that case, t is size 2M of 1s/2s and X is size 2M * 1, wherein t[i] is 1 if X[i] is in cluster 1 and the same for cluster 2.
I figured the best way to start doing this is generating the array array using numpys random. What I'm confused about is how to get it centered according to the cluster?
Would the best way be to generate a cluster sized M, then add co1 to each of the points? How would I make it random though, and make sure t[i] is colored in properly?
I'm using this function to graph the data:
def graphData():
co1 = (0.5, -0.5)
co2 = (-0.5, 0.5)
M = 1000
X, t = genData(co1, co2, M)
colors = np.array(['r', 'b'])
plt.figure()
plt.scatter(X[:, 0], X[:, 1], color = colors[t], s = 10)
Upvotes: 6
Views: 10315
Reputation: 2838
For your purpose, I would go for sklearn
sample generator make_blobs:
from sklearn.datasets import make_blobs
centers = [(-5, -5), (5, 5)]
cluster_std = [0.8, 1]
X, y = make_blobs(n_samples=100, cluster_std=cluster_std, centers=centers, n_features=2, random_state=1)
plt.scatter(X[y == 0, 0], X[y == 0, 1], color="red", s=10, label="Cluster1")
plt.scatter(X[y == 1, 0], X[y == 1, 1], color="blue", s=10, label="Cluster2")
You can generate multi-dimensional clusters with this. X
yields data points and y
is determining which cluster a corresponding point in X
belongs to.
This might be too much for what you try to achieve in this case, but generally, I think it's better to rely on more general and better-tested library codes that can be used in other cases as well.
Upvotes: 9
Reputation: 2128
You can use something like following code:
center1 = (50, 60)
center2 = (80, 20)
distance = 20
x1 = np.random.uniform(center1[0], center1[0] + distance, size=(100,))
y1 = np.random.normal(center1[1], distance, size=(100,))
x2 = np.random.uniform(center2[0], center2[0] + distance, size=(100,))
y2 = np.random.normal(center2[1], distance, size=(100,))
plt.scatter(x1, y1)
plt.scatter(x2, y2)
plt.show()
Upvotes: 3