DsCpp
DsCpp

Reputation: 2489

How to create a scatter plot with two colors per dot?

I'm trying to plot both the ground-truth and my classification simultaneously in matplotlib.

Currently, I only plot the groud-truth, after applying tsne on the feature space and adding the edges using the following code

from matplotlib.collections import LineCollection
cols=['rgbkm'[lbl] for lbl in list(data.y.cpu().numpy() - 1)]

lc = LineCollection(X_embedded[out_dict['edges']],linewidth=0.05)
fig = plt.figure()
plt.gca().add_collection(lc)
plt.xlim(X_embedded[:,0].min(), X_embedded[:,0].max())
plt.ylim(X_embedded[:,1].min(), X_embedded[:,1].max())
plt.scatter(X_embedded[:,0],X_embedded[:,1], c=cols)

This gives the following plot: enter image description here

While, I hope to somehow color each vertex in the following way:

enter image description here

Upvotes: 3

Views: 1706

Answers (1)

JohanC
JohanC

Reputation: 80329

Here are two approaches.

The dots of regular scatter plots can have an interior color and an edge color. scatter accepts an array for either one of them, but not for both. So, you could just iterate through all edge colors and plot them in a loop over the same plot. Playing with linewidth might give help to visualize the true and the predicted colors together.

Matplotlib's plot function accepts marker filling styles, which have a possibility of being bicolored, either top-bottom or left-right. Per plot you can only give one type of style. So, for 5 colors, there are 25 combinations which can be drawn in a loop.

Bonus points:

While looping through the colors, plot can generate legend labels with the corresponding bicolored dot.

Here is some code to illustrate the concepts:

from matplotlib import pyplot as plt
from matplotlib.collections import LineCollection
import numpy as np

N = 50

labels = ['ant', 'bee', 'cat', 'dog', 'elk']  # suppose these are the labels for the prediction
colors = list('rgbkm') # a list of 5 colors
cols_true = np.repeat(range(5), N)  # suppose the first N have true color 0, the next N true color 1, ...
cols_pred = np.random.randint(0, 5, N * 5)  # as a demo, take a random number for each predicted color

# for x and y, suppose some 2D gaussian normal distribution around some centers,
#   this would make the 'true' colors nicely grouped 
x = np.concatenate([np.random.normal(cx, 2, N) for cx in [5, 9, 7, 2, 2]])
y = np.concatenate([np.random.normal(cy, 1.5, N) for cy in [2, 5, 9, 8, 3]])

fig, ax = plt.subplots(figsize=(10,6))
for tc in range(5):
    for pc in range(5):
        mask = (cols_true == tc) & (cols_pred == pc)
        plt.plot(x[mask], y[mask], c=colors[tc], markerfacecoloralt=colors[pc],
                 marker='.', linestyle='', markeredgecolor='None',
                 markersize=15, fillstyle='left', markeredgewidth=0,
                 label=f'Tr: {labels[tc]} - Pr: {labels[pc]}')
plt.legend(loc='upper right', bbox_to_anchor=(1, -0.1), fontsize=10, ncol=5)
plt.tight_layout()
plt.show()

resulting plot

Upvotes: 5

Related Questions