Jazim Sohail
Jazim Sohail

Reputation: 11

colormap scatter plot dependant on cluster membership

Im conducting soft clustering on a data set and I wanted to create a cool graphic that looks similar to the image posted. I want to show a data points membership between two (or more clusters) in graphical form. Im not really sure how to go about this however. Ive used criteria to assign colours to a data point, but am unsure how to create a more dynamic sort of graphic seen below. Any help appreciated.

enter image description here

Upvotes: 1

Views: 327

Answers (2)

Alexander L. Hayes
Alexander L. Hayes

Reputation: 4273

The GaussianMixture in scikit-learn does something close to what the question asks.

Specifically, predict_proba(X) returns an array with the probability of each point in X belonging to the component. In the example below we fit two mixture components, so the last two plots should be opposites of each other:

from sklearn.mixture import GaussianMixture
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt

X, _ = make_moons(noise=0.05)

mix = GaussianMixture(n_components=2).fit(X)
probs = mix.predict_proba(X)

fig, ax = plt.subplots(1, 3, sharey=True)
ax[0].scatter(X[:, 0], X[:, 1])
ax[1].scatter(X[:, 0], X[:, 1], c=probs[:, 0])
ax[2].scatter(X[:, 0], X[:, 1], c=probs[:, 1])
plt.show()

Three scatter plots of the synthetic moons data set. Left shows the original data, middle shows probability of being in cluster 0, right is the exact opposite of the middle.

Upvotes: 0

Martin Gardfjell
Martin Gardfjell

Reputation: 168

I think markers are just the thing your looking for:

x1 = y1 = 1
x2 = y2 = 2

dx = np.random.rand(10)
dy = np.random.rand(10)

x = np.array([x1 + dx, x2 + dx]).ravel()
y = np.array([y1 + dy, y2 + dy]).ravel()

threshold = 4
markers = np.array(["o" if xy > threshold else "h" for xy in x + y])


fig, ax = plt.subplots()
for marker in np.unique(markers):
    index = markers == marker 
    ax.scatter(x[index], y[index], marker=marker)

enter image description here

Adding someaditional code to control color and transparency (alpha)

import numpy as np
import matplotlib.pyplot as plt


x1 = y1 = 1
x2 = y2 = 2

dx = np.random.rand(10)
dy = np.random.rand(10)

x = np.array([x1 + dx, x2 + dx]).ravel()
y = np.array([y1 + dy, y2 + dy]).ravel()

threshold = 4
markers = np.array(["o" if xy > threshold else "h" for xy in x + y])

blue_color = "midnightblue" # predefined
pink_color = "orchid"  
colors = [blue_color if marker == "o" else pink_color for marker in markers]

alphas = np.array([abs(xy - threshold) for xy in x + y])
alphas = 1 - alphas/np.max(alphas) 


fig, ax = plt.subplots()
for i in range(len(x)):
    ax.scatter(x[i], y[i], marker=markers[i], color=colors[i], alpha=alphas[i])

enter image description here

Upvotes: 1

Related Questions