Plotting the mean of multiple columns including standard deviation

Question

I have a data set with 8 columns and several rows. The columns contain measurements for different variable (6 in total) under 2 different conditions, each consisting of 4 columns that contain repeated measurements for a particular condition.

Using Searborn, I would like to generate a bar chart displaying the mean and standard deviation of every 4 columns, grouped by index key (i.e. measured variable). The dataframe structure is as follows:

np.random.seed(10)
df = pd.DataFrame({
    'S1_1':np.random.randn(6),
    'S1_2':np.random.randn(6),
    'S1_3':np.random.randn(6),
    'S1_4':np.random.randn(6),
    'S2_1':np.random.randn(6),
    'S2_2':np.random.randn(6),
    'S2_3':np.random.randn(6),
    'S2_4':np.random.randn(6),
    },index= ['var1','var2','var3','var4','var5','var6'])

How do I pass to seaborn that I would like only 2 bars, 1 for the first 4 columns and 1 for the second. With each bar displaying the mean (and standard deviation or some other measure of dispersion) across 4 columns.

I was thinking of using multi-indexing, adding a second column level to group the columns into 2 condition,

df.columns = pd.MultiIndex.from_arrays([['Condition 1'] * 4 + ['Condition 2'] * 4,df.columns])

but I can't figure out what I should pass to Seaborn to generate the plot I want.

If anyone could point me in the right direction, that would be a great help!

Trenton McKinney · Accepted Answer

Update Based on Comment

Plotting is all about reshaping the dataframe for the plot API

# still create the groups
l = df.columns
n = 4
groups = [l[i:i+n] for i in range(0, len(l), n)]
num_gps = len(groups)

# stack each group and add an id column
data_list = list()
for group in groups:
    id_ = group[0][1]
    data = df[group].copy().T
    data['id_'] = id_
    data_list.append(data)
    
df2 = pd.concat(data_list, axis=0).reset_index()
df2.rename({'index': 'sample'}, axis=1, inplace=True)

# melt df2 into a long form
dfm = df2.melt(id_vars=['sample', 'id_'])

# plot
p = sns.catplot(kind='bar', data=dfm, x='variable', y='value', hue='id_', ci='sd', aspect=3)

`df2.head()`

  sample    YAL001C    YAL002W   YAL004W   YAL005C   YAL007C   YAL008W    YAL011W   YAL012W    YAL013W   YAL014C id_
0   S2_1 -13.062716  -8.084685  2.360795 -0.740357  3.086768 -0.117259  -5.678183  2.527573 -17.326287 -1.319402   2
1   S2_2  -5.431474 -12.676807  0.070569 -4.214761 -4.318011 -4.489010 -10.268632  0.691448 -24.189106 -2.343884   2
2   S2_3  -9.365509 -12.281169  0.497772 -3.228236  0.212941 -2.287206 -10.250004  1.111842 -27.811564 -4.329987   2
3   S2_4  -7.582111 -15.587219 -1.286167 -4.531494 -3.090265 -4.718281  -8.933496  2.079757 -21.580854 -2.834441   2
4   S3_1 -12.618254 -20.010779 -2.530541 -3.203072 -2.436503 -2.922565 -15.972632  3.551605 -35.618485 -4.925495   3

`dfm.head()`

  sample id_ variable      value
0   S2_1   2  YAL001C -13.062716
1   S2_2   2  YAL001C  -5.431474
2   S2_3   2  YAL001C  -9.365509
3   S2_4   2  YAL001C  -7.582111
4   S3_1   3  YAL001C -12.618254

Plot Result

`kind='box'`

A box plot might be a better to convey the distribution

p = sns.catplot(kind='box', data=dfm, y='variable', x='value', hue='id_', height=12)

Original Answer

Use a list comprehension to chunk the columns into groups of 4
- This uses the original, more comprehensive data that was posted. It can be found in revision 4
Create a figure with subplots and zip each group to an ax from axes
Use each group to select data from df and transpose the data with .T.
Using sns.barplot the default estimator is mean, so the length of the bar is the mean, and set ci='sd' so the confidence interval is the standard deviation.
- sns.barplot(data=data, ci='sd', ax=ax) can easily be replaced with sns.boxplot(data=data, ax=ax)

import seaborn as sns

# using the first comma separated data that was posted, create groups of 4
l = df.columns
n = 4  # chunk size for groups
groups = [l[i:i+n] for i in range(0, len(l), n)]
num_gps = len(groups)

# plot
fig, axes = plt.subplots(num_gps, 1, figsize=(12, 6*num_gps))

for ax, group in zip(axes, groups):
    data = df[group].T
    sns.barplot(data=data, ci='sd', ax=ax)
    ax.set_title(f'{group.to_list()}')
fig.tight_layout()
fig.savefig('test.png')

Example of `data`

The bar is the mean of each column, and the line is the standard deviation

       YAL001C    YAL002W   YAL004W   YAL005C   YAL007C   YAL008W    YAL011W   YAL012W    YAL013W   YAL014C
S8_1 -1.731388 -17.215712 -3.518643 -2.358103  0.418170 -1.529747 -12.630343  2.435674 -27.471971 -4.021264
S8_2 -1.325524 -24.056632 -0.984390 -2.119338 -1.770665 -1.447103 -10.618954  2.156420 -30.362998 -4.735058
S8_3 -2.024020 -29.094027 -6.146880 -2.101090 -0.732322 -2.773949 -12.642857 -0.009749 -28.486835 -4.783863
S8_4  2.541671 -13.599049 -2.688125 -2.329332 -0.694555 -2.820627  -8.498677  3.321018 -31.741916 -2.104281

Plotting the mean of multiple columns including standard deviation

Answers (1)

Update Based on Comment

`df2.head()`

`dfm.head()`

Plot Result

`kind='box'`

Original Answer

Example of `data`

Plot Result

Related Questions

Plotting the mean of multiple columns including standard deviation

Answers (1)

Update Based on Comment

df2.head()

dfm.head()

Plot Result

kind='box'

Original Answer

Example of data

Plot Result

Related Questions

`df2.head()`

`dfm.head()`

`kind='box'`

Example of `data`