Pugl
Pugl

Reputation: 452

Seaborn stripplot based on columns

I have the following pandas dataframe:

Group   Exp1          Exp2        Exp3         Exp4     Exp5    Control
0   1   0.005556    -0.101111   0.052632    -0.055556   0.033333    y
1   2   -0.115684   0.076667    -0.349497   0.555556    0.555556    n
2   3   0.184444    0.251397    0.022222    -0.444444   0.611650    n
3   4   0.075556    0.237778    0.368750    0.098901    -0.111111   n
4   5   -0.186916   -0.355556   0.172414    0.087120    0.034737    y
5   6   0.250000    0.152542    -0.395349   0.111111    0.000000    n
6   7   -0.025014   0.030000    0.594444    0.055556    0.311111    n
7   8   -0.062500   0.123333    0.317778    0.144444    0.288889    n
8   9   0.001111    0.141111    0.181111    0.011111    0.435897    n
9   10  -0.124444   -0.074241   0.074444    -0.111111   0.133333    y

Now the typical seaborn stripplot uses the rows to plot different categories. I would like to, however, have the different categories be the columns (the different experiments) and plot the 10 values for each group and each experiment vertically above the experiments marker on the x-axis. How do I achieve this?

Upvotes: 1

Views: 1997

Answers (2)

JohanC
JohanC

Reputation: 80279

Seaborn usually works easiest with "long form" data, so with one column indicating the experiment and another the corresponding values. Seaborn also accepts some kinds of "wide" data for sufficiently simple structured dataframes. In this case, converting the "Group" column to an index would do the job.

So, it looks like:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

df = pd.DataFrame({f'Exp{i}': np.random.randn(10) for i in range(1, 6)})
df['Group'] = range(1, 11)
ax = sns.stripplot(data=df.set_index('Group'))
ax.xaxis.tick_top()
plt.show()

example plot

The wide form doesn't support hue in this case (sns.stripplot(data=df.drop(columns=['Group', 'Control']), hue=df['Control'])) gives an error telling that hue is not supported when x and y are not explicitly set.

But the "long form" can be used. Pandas melt() converts a dataframe to the desired long form:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from io import StringIO

data_str = '''   Group   Exp1          Exp2        Exp3         Exp4     Exp5    Control
0   1   0.005556    -0.101111   0.052632    -0.055556   0.033333    y
1   2   -0.115684   0.076667    -0.349497   0.555556    0.555556    n
2   3   0.184444    0.251397    0.022222    -0.444444   0.611650    n
3   4   0.075556    0.237778    0.368750    0.098901    -0.111111   n
4   5   -0.186916   -0.355556   0.172414    0.087120    0.034737    y
5   6   0.250000    0.152542    -0.395349   0.111111    0.000000    n
6   7   -0.025014   0.030000    0.594444    0.055556    0.311111    n
7   8   -0.062500   0.123333    0.317778    0.144444    0.288889    n
8   9   0.001111    0.141111    0.181111    0.011111    0.435897    n
9   10  -0.124444   -0.074241   0.074444    -0.111111   0.133333    y'''

df = pd.read_csv(StringIO(data_str), delim_whitespace=True)
###df.set_index('Group', inplace=True)
##ax = sns.stripplot(data=df.drop(columns=['Control']), hue=df['Control'])

long_df = df.melt(id_vars=['Group', 'Control'], var_name='Experiment', value_name='Value')
ax = sns.stripplot(data=long_df, x='Experiment', y='Value', hue='Control')
ax.xaxis.tick_top()
plt.tight_layout()
plt.show()

long form

The long form of the dataframe looks like:

   Group Control Experiment     Value
0      1       y       Exp1  0.005556
1      2       n       Exp1 -0.115684
2      3       n       Exp1  0.184444
3      4       n       Exp1  0.075556
4      5       y       Exp1 -0.186916
...

Upvotes: 1

mwaskom
mwaskom

Reputation: 48992

Drop the column you don't want to plot and pass the rest to data:

sns.stripplot(data=df.drop("Group", axis=1))

enter image description here

It's good to learn how to do the full transformation to long-form data that @JohanC demonstrates, but also good to know how to take advantage of the wide-form data support when it fits with what you want to do.

Upvotes: 3

Related Questions