Reputation: 8124
I have this pandas DataFrame:
>>> print(df)
Channel 0 1 2 3 4 5 6 7
Sample
7d 3.82 4.10 3.86 3.86 3.95 3.65 3.43 3.63
12d 2.97 4.32 3.50 3.58 3.22 3.37 3.58 3.78
17d 4.01 4.04 4.10 3.43 3.76 3.26 3.35 3.48
DO 3.07 3.58 3.14 3.22 3.11 3.09 3.16 3.16
I want to do a plot similar to this (the code is sns.swarmplot(df)
):
But the colors should be set not per-channel (i.e. DataFrame column) but per-sample (i.e. DataFrame rows). So each "category" on the x-axis will have 4 colors corresponding to the rows 7d, 12d, 17d and DO.
Is there an easy way to accomplish this in seaborn?
EDIT: I should add that I tried using the hue
keyword, but it says it requires using also x
and y
keyword. According to this example seems that I need to create a new DataFrame with all numeric values in one column and two other columns with sample and channel information. Then I can call the plot as sns.swarmplot(x='Channel', y='values', hue='Sample')
. Is there a more direct way that does not involve creating an additional ad-hoc DataFrame?
EDIT2: Following @BrenBarn suggestion, I end up creating a new "tidy" DataFrame with:
dd = []
for sa in df.index:
print(sa)
d = pd.DataFrame(df.loc[sa]).reset_index()
d.columns = ['Channel', 'Leakage']
d['Sample'] = sa
dd.append(d)
ddf = pd.concat(dd)
And then plotting the data with:
sns.swarmplot(x='Channel', y='Leakage', hue='Sample', data=ddf)
which gives the plot I expected:
I was hoping there was a way to tell seaborn to use original "2-D table" format to do the plot which is much more compact and natural for this kind of data. If this is possible I would accept the answer ;).
Upvotes: 4
Views: 12988
Reputation: 1375
You've basically answered your question in the edit, but you may want to look at
pd.melt
or pd.stack
as an easier way of creating your new tidy DataFrame.
e.g.
s=df.stack()
s.name='values'
df_tidy=s.reset_index()
sns.stripplot(data=df_tidy,hue='sample',x='Channel',y='values')
or
df_tidy=pd.melt(df.reset_index(),id_vars=['sample'],value_vars=df.columns.tolist(),value_name='values')
sns.stripplot(data=df_tidy,hue='sample',x='Channel',y='values')
Upvotes: 3