Reputation: 11
Im conducting soft clustering on a data set and I wanted to create a cool graphic that looks similar to the image posted. I want to show a data points membership between two (or more clusters) in graphical form. Im not really sure how to go about this however. Ive used criteria to assign colours to a data point, but am unsure how to create a more dynamic sort of graphic seen below. Any help appreciated.
Upvotes: 1
Views: 327
Reputation: 4273
The GaussianMixture
in scikit-learn
does something close to what the question asks.
Specifically, predict_proba(X)
returns an array with the probability of each point in X
belonging to the component. In the example below we fit two mixture components, so the last two plots should be opposites of each other:
from sklearn.mixture import GaussianMixture
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt
X, _ = make_moons(noise=0.05)
mix = GaussianMixture(n_components=2).fit(X)
probs = mix.predict_proba(X)
fig, ax = plt.subplots(1, 3, sharey=True)
ax[0].scatter(X[:, 0], X[:, 1])
ax[1].scatter(X[:, 0], X[:, 1], c=probs[:, 0])
ax[2].scatter(X[:, 0], X[:, 1], c=probs[:, 1])
plt.show()
Upvotes: 0
Reputation: 168
I think markers are just the thing your looking for:
x1 = y1 = 1
x2 = y2 = 2
dx = np.random.rand(10)
dy = np.random.rand(10)
x = np.array([x1 + dx, x2 + dx]).ravel()
y = np.array([y1 + dy, y2 + dy]).ravel()
threshold = 4
markers = np.array(["o" if xy > threshold else "h" for xy in x + y])
fig, ax = plt.subplots()
for marker in np.unique(markers):
index = markers == marker
ax.scatter(x[index], y[index], marker=marker)
Adding someaditional code to control color and transparency (alpha)
import numpy as np
import matplotlib.pyplot as plt
x1 = y1 = 1
x2 = y2 = 2
dx = np.random.rand(10)
dy = np.random.rand(10)
x = np.array([x1 + dx, x2 + dx]).ravel()
y = np.array([y1 + dy, y2 + dy]).ravel()
threshold = 4
markers = np.array(["o" if xy > threshold else "h" for xy in x + y])
blue_color = "midnightblue" # predefined
pink_color = "orchid"
colors = [blue_color if marker == "o" else pink_color for marker in markers]
alphas = np.array([abs(xy - threshold) for xy in x + y])
alphas = 1 - alphas/np.max(alphas)
fig, ax = plt.subplots()
for i in range(len(x)):
ax.scatter(x[i], y[i], marker=markers[i], color=colors[i], alpha=alphas[i])
Upvotes: 1