user2304916
user2304916

Reputation: 8124

Seaborn categorical plot with hue from DataFrame rows

I have this pandas DataFrame:

>>> print(df)
Channel     0     1     2     3     4     5     6     7
Sample                                                 
7d       3.82  4.10  3.86  3.86  3.95  3.65  3.43  3.63
12d      2.97  4.32  3.50  3.58  3.22  3.37  3.58  3.78
17d      4.01  4.04  4.10  3.43  3.76  3.26  3.35  3.48
DO       3.07  3.58  3.14  3.22  3.11  3.09  3.16  3.16

I want to do a plot similar to this (the code is sns.swarmplot(df)):

enter image description here

But the colors should be set not per-channel (i.e. DataFrame column) but per-sample (i.e. DataFrame rows). So each "category" on the x-axis will have 4 colors corresponding to the rows 7d, 12d, 17d and DO.

Is there an easy way to accomplish this in seaborn?

EDIT: I should add that I tried using the hue keyword, but it says it requires using also x and y keyword. According to this example seems that I need to create a new DataFrame with all numeric values in one column and two other columns with sample and channel information. Then I can call the plot as sns.swarmplot(x='Channel', y='values', hue='Sample'). Is there a more direct way that does not involve creating an additional ad-hoc DataFrame?

EDIT2: Following @BrenBarn suggestion, I end up creating a new "tidy" DataFrame with:

dd = []
for sa in df.index:
    print(sa)
    d = pd.DataFrame(df.loc[sa]).reset_index()
    d.columns = ['Channel', 'Leakage']
    d['Sample'] = sa
    dd.append(d)
ddf = pd.concat(dd)

And then plotting the data with:

sns.swarmplot(x='Channel', y='Leakage', hue='Sample', data=ddf)

which gives the plot I expected:

enter image description here

I was hoping there was a way to tell seaborn to use original "2-D table" format to do the plot which is much more compact and natural for this kind of data. If this is possible I would accept the answer ;).

Upvotes: 4

Views: 12988

Answers (1)

Victor Chubukov
Victor Chubukov

Reputation: 1375

You've basically answered your question in the edit, but you may want to look at pd.melt or pd.stack as an easier way of creating your new tidy DataFrame.

e.g.

s=df.stack()
s.name='values'
df_tidy=s.reset_index()
sns.stripplot(data=df_tidy,hue='sample',x='Channel',y='values')

or

df_tidy=pd.melt(df.reset_index(),id_vars=['sample'],value_vars=df.columns.tolist(),value_name='values')
sns.stripplot(data=df_tidy,hue='sample',x='Channel',y='values')

Upvotes: 3

Related Questions