Reputation: 1
I am currently trying to import some data in a table into python to create a plot of one variable against another. I also want to group each of the data point by two of the other variable in the same table.
One of the variables (the one I want to assign colour to) only has 3 options. The other variable (the one I want to assign the shape to) only has 5. Both of which I can easily group the data into. The issue just comes with plotting, as not all of the groups contain all 3 options of the "colour" variable. I can get the scatter plot to show shapes or colours easily, it is when I combine them that I have an issue.
At the moment I can make it so that the colour is plotted, but there are two sets of shapes for each data point: one that is the correct shape, and the other just a standard point. If I remove what is causing the double points however, the colours are not correct.
This is my current code (with example data), I have given the colour variable letters, but the real data is as simplistic:
import matplotlib.pyplot as plt
import numpy as np
r = np.array([600, 2000, 980, 1770, 920, 1100, 220])
t = np.array([2.7, 12.67, 10.54, 1.3, 16.1, 0.92, 13.56])
spectra_type = np.array(['A', 'A', 'B', 'A', 'C', 'B', 'A'])
spectra_num = np.array([{'A': 0, 'B': 1, 'C': 2}[i] for i in spectra_type])
i = np.array(['Shape1','Shape2','Shape3','Shape4','Shape5','Shape2','Shape4'])
shape1 = np.where(i=='Shape1')[0]
shape2 = np.where(i=='Shape2')[0]
shape3 = np.where(i=='Shape3')[0]
shape4 = np.where(i=='Shape4')[0]
shape5 = np.where(i=='Shape5')[0]
plt.figure('fig 1')
plt.xlabel('x')
plt.ylabel('y')
plt.scatter(t[shape1], r[shape1], c=spectra_num[shape1], marker='D', label='Shape1')
plt.scatter(t[shape2], r[shape2], c=spectra_num[shape2], marker='^', label='Shape2')
plt.scatter(t[shape3], r[shape3], c=spectra_num[shape3], marker='o', label='Shape3')
plt.scatter(t[shape4], r[shape4], c=spectra_num[shape4], marker='s', label='Shape4')
plt.scatter(t[shape5], r[shape5], c=spectra_num[shape5], marker='*', label='Shape5')
first_legend = plt.legend(loc='upper left')
plt.gca().add_artist(first_legend)
scatter = plt.scatter(t, r, c=spectra_num)
plt.legend(handles=scatter.legend_elements()[0], labels=['A', 'B', 'C'], title='Colour')
This gives me the following graph, as you can see the shapes are all there but are overlayed with another "regular" shape.
Any advice would be much appreciated!
Upvotes: 0
Views: 66
Reputation: 25093
Everything is pretty standard, except how I compute the handles for the legend, and how I place the legend outside of the Axes using a new (Matplotlib 3.7) feature of Figure.legend()
loc
keyword argument.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
np.random.seed(20250227)
N = 80 # no. of points
N1 = 3 # no. of different properties in category 1
N2 = 5 # no. of different properties in category
names1 = 'Ear Eye Nose'.split()
names2 = 'Africa Asia Europe N.America S.America'.split()
d1 = dict(zip(range(N1), ['C'+str(i) for i in range(N1)]))
markers = list(Line2D.filled_markers)
np.random.shuffle(markers)
markers = markers[:N2]
d2 = dict(zip(range(N2), markers))
# fake data
x, y = np.random.rand(2, N)
cat1 = np.random.randint(N1, size=N)
cat2 = np.random.randint(N2, size=N)
fig = plt.figure(figsize=(6, 6), layout='constrained')
for c2 in range(N2):
marker = markers[c2]
x2 = [xx for xx, cc in zip(x, cat2) if cc==c2]
y2 = [yy for yy, cc in zip(y, cat2) if cc==c2]
colors = [d1[color] for color, cc in zip(cat1, cat2) if cc==c2]
plt.scatter(x2, y2,
color=colors,
marker=marker,
)
plt.gca().set_aspect(1)
plt.xlim((-0.05, 1.05));
plt.ylim((-0.05, 1.05));
handles = [Line2D([], [],
color=d1[c1],
marker=d2[c2],
lw=0,
label=f'({names1[c1]}, {names2[c2]})'
)
for c2 in range(N2) for c1 in range(N1)]
fig.legend(handles=handles, ncols=5, loc='outside upper center', fontsize='x-small',
title='Cat1 is mapped to different colors, Cat2 to different shapes')
plt.show()
Upvotes: 1
Reputation: 6482
I would recommend that you use a package like seaborn, in particular, the scatterplot
function, which will simplify things for you a lot. By putting the data into a dictionary, your example can be reduced to:
import seaborn as sns
shape_markers = {
"Shape1": "D",
"Shape2": "^",
"Shape3": "o",
"Shape4": "s",
"Shape5": "*",
}
colours = {
"A": "C0",
"B": "C1",
"C": "C2",
}
data = {
"r": [600, 2000, 980, 1770, 920, 1100, 220],
"t": [2.7, 12.67, 10.54, 1.3, 16.1, 0.92, 13.56],
"spectra": ["A", "A", "B", "A", "C", "B", "A"],
"shape": ["Shape1", "Shape2", "Shape3", "Shape4", "Shape5", "Shape2", "Shape4"],
}
ax = sns.scatterplot(
data,
x="t",
y="r",
hue="spectra",
palette=colours,
style="shape",
markers=shape_markers,
)
ax.figure.show()
Upvotes: 0
Reputation: 3096
only way I found using your code, I had to modify some part of the input.
I guess you could have done the same the other way round:
import matplotlib.pyplot as plt
import numpy as np
r = np.array([600, 2000, 980, 1770, 920, 1100, 220])
t = np.array([2.7, 12.67, 10.54, 1.3, 16.1, 0.92, 13.56])
spectra_type = np.array(['red', 'red', 'blue', 'red', 'yellow', 'blue', 'red'])
spectra_num = np.array([{'red': 0, 'blue': 1, 'yellow': 2}[i] for i in spectra_type])
print(spectra_num)
i = np.array(['Shape1','Shape2','Shape3','Shape4','Shape5','Shape2','Shape4'])
shape1 = np.where(i=='Shape1')[0]
shape2 = np.where(i=='Shape2')[0]
shape3 = np.where(i=='Shape3')[0]
shape4 = np.where(i=='Shape4')[0]
shape5 = np.where(i=='Shape5')[0]
print(shape1, type(shape1))
print(spectra_num[shape1])
print(spectra_num[shape2])
print(spectra_num[shape3])
print(spectra_num[shape4])
print(spectra_num[shape5])
plt.figure('fig 1')
plt.xlabel('x')
plt.ylabel('y')
plt.scatter(t[shape1], r[shape1], c=spectra_type[shape1], marker='D', label='Shape1')
plt.scatter(t[shape2], r[shape2], c=spectra_type[shape2], marker='^', label='Shape2')
plt.scatter(t[shape3], r[shape3], c=spectra_type[shape3], marker='o', label='Shape3')
plt.scatter(t[shape4], r[shape4], c=spectra_type[shape4], marker='s', label='Shape4')
plt.scatter(t[shape5], r[shape5], c=spectra_type[shape5], marker='*', label='Shape5')
first_legend = plt.legend(loc='upper center')
first_legend.legend_handles[0].set_facecolor('black')
first_legend.legend_handles[1].set_facecolor('black')
first_legend.legend_handles[2].set_facecolor('black')
first_legend.legend_handles[3].set_facecolor('black')
first_legend.legend_handles[4].set_facecolor('black')
plt.gca().add_artist(first_legend)
red= plt.Circle((0, 0), 0.1, color='red')
blue= plt.Circle((0, 0), 0.1, color='blue')
yellow= plt.Circle((0, 0), 0.1, color='yellow')
plt.legend(handles= [red, blue, yellow ], labels=['red', 'blue', 'yellow'], title='Colour')
# scatter = plt.scatter(t, r, c=spectra_num)
# plt.legend(handles=scatter.legend_elements()[0], labels=['A', 'B', 'C'], title='Colour')
output:
I guess there is more than one way to do it, but most if not all of them can not be achieved superposing two different set of scatter
plots (5 of them being plt.scatter lines and the last one scatter = .. line).
Maybe someone more knowledgeable will step in
Upvotes: 0