Reputation: 25
I have following two dataframes that I would like to plot together. The first one (data
) contains the complete data of different groups for several repeated experiments (=replicates) with the values for the individual cells within that experiment. The second one (avgs
) summarizes the mean of each replicate experiment for all groups. I basically want to plot my data in the way suggested here.
data.head()
cell replicate value group
0 1 1 0.029723 GROUP_A
1 1 2 0.019136 GROUP_A
2 2 2 0.020216 GROUP_A
3 3 1 0.032020 GROUP_B
4 3 2 0.044815 GROUP_B
avgs.head()
replicate value group
0 1 0.019709 GROUP_A
1 2 0.018937 GROUP_A
2 1 0.358437 GROUP_B
3 2 0.269602 GROUP_B
4 3 0.303252 GROUP_B
My aim is to achieve either the plots shown in B or C, where the hue depends on both the group and replicate.
import matplotlib.pyplot as plt
import seaborn as sns
sns.swarmplot(x="group", y="value", data=data, hue="replicate")
sns.swarmplot(x="group", y="value", data=avgs,size=8,hue="replicate", edgecolor="k", linewidth=2)
will give me basically the plot shown in A, with the hue corresponding to the replicate.
Is there a way to do this either with a different color palette for each group, so that the each group have different colors with each replicate having different shades of that color (example B, made in Affinity Designer)?
An alternative that would work for me is to plot the single cell values of data
with a grey palette. However how can I achieve that when I add the replicate mean data of avgs
, each group has a different color and each replicate mean has the corresponding shading in that color (example C)?
Is there the possibility to pass a palette dictionary to seaborn/matplotlib e.g. something like:
gray = sns.dark_palette("gray", n_colors=5)
red = sns.dark_palette("red", n_colors=5)
blue = sns.dark_palette("blue", n_colors=5)
my_palette={"GROUP_A": gray, "GROUP_B": red, "GROUP_C": blue}
Thanks!
Upvotes: 1
Views: 3425
Reputation: 80534
The groups can be plotted separately, each with its own palette. To make sure the x-positions are respected, the order=
keyword needs to be set with all the desired x-labels.
Seaborn automatically adds legend entries for each call, so the legend can get very large. You can either suppress the legend, or limit it to the first few entries.
from matplotlib import pyplot as plt
import matplotlib
import numpy as np
import pandas as pd
import seaborn as sns
N = 500
data = pd.DataFrame({'replicate': np.random.choice(range(1, 4), N),
'value': 2 + np.random.uniform(-0.5, 0.5, (N, 5)).sum(axis=1),
'group': np.random.choice([f'GROUP_{g}' for g in 'ABCD'], N)})
groups = np.unique(data.group)
for g in groups:
data.loc[data.group == g, 'value'] += np.random.uniform(0, 3)
avgs = data.groupby(['replicate', 'group']).mean()
avgs.reset_index(inplace=True)
my_palette = {"GROUP_A": 'Greys', "GROUP_B": 'Reds', "GROUP_C": 'Blues', "GROUP_D": 'Greens'}
for ind, g in enumerate(groups):
sns.swarmplot(x="group", y="value", data=data[data.group == g], order=groups,
palette=my_palette[g], hue="replicate")
sns.swarmplot(x="group", y="value", data=avgs[avgs.group == g], order=groups,
size=8, palette=my_palette[g], hue="replicate", edgecolor="k", linewidth=2)
# plt.gca().legend_.remove() # optionally suppress the legend
handles, labels = plt.gca().get_legend_handles_labels()
plt.legend(handles=handles[:3], title='replicate')
plt.tight_layout()
plt.show()
Upvotes: 3