Alexis
Alexis

Reputation: 2304

Boxplot by two groups in pandas

I have the following dataset:

df_plots = pd.DataFrame({'Group':['A','A','A','A','A','A','B','B','B','B','B','B'],
                         'Type':['X','X','X','Y','Y','Y','X','X','X','Y','Y','Y'],
                         'Value':[1,1.2,1.4,1.3,1.8,1.5,15,19,18,17,12,13]})
df_plots
    Group   Type    Value
0   A       X       1.0
1   A       X       1.2
2   A       X       1.4
3   A       Y       1.3
4   A       Y       1.8
5   A       Y       1.5
6   B       X       15.0
7   B       X       19.0
8   B       X       18.0
9   B       Y       17.0
10  B       Y       12.0
11  B       Y       13.0

And I want to create boxplots per Group (there are two in the example) and in each plot to show by type. I have tried this:

fig, axs = plt.subplots(1,2,figsize=(8,6), sharey=False)
axs = axs.flatten()

for i, g in enumerate(df_plots[['Group','Type','Value']].groupby(['Group','Type'])):
    g[1].boxplot(ax=axs[i])
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-12-8e1150950024> in <module>
      3 
      4 for i, g in enumerate(df[['Group','Type','Value']].groupby(['Group','Type'])):
----> 5     g[1].boxplot(ax=axs[i])

IndexError: index 2 is out of bounds for axis 0 with size 2

Then I tried this:

fig, axs = plt.subplots(1,2,figsize=(8,6), sharey=False)
axs = axs.flatten()

for i, g in enumerate(df_plots[['Group','Type','Value']].groupby(['Group','Type'])):
    g[1].boxplot(ax=axs[i], by=['Group','Type'])

But no, I have the same problem. The expected result should have only two plots, and each plot have a box-and-whisker per Type. This is a sketch of this idea:

enter image description here

Please, any help will be greatly appreciated, with this code I can control some aspects of the data that I can't with seaborn.

Upvotes: 5

Views: 7744

Answers (4)

tdy
tdy

Reputation: 41327

As @Prune mentioned, the immediate issue is that your groupby() returns four groups (AX, AY, BX, BY), so first fix the indexing and then clean up a couple more issues:

  1. Change axs[i] to axs[i//2] to put groups 0 and 1 on axs[0] and groups 2 and 3 on axs[1].
  2. Add positions=[i] to place the boxplots side by side rather than stacked.
  3. Set the title and xticklabels after plotting (I'm not aware of how to do this in the main loop).
for i, g in enumerate(df_plots.groupby(['Group', 'Type'])):
    g[1].boxplot(ax=axs[i//2], positions=[i])

for i, ax in enumerate(axs):
    ax.set_title('Group: ' + df_plots['Group'].unique()[i])
    ax.set_xticklabels(['Type: X', 'Type: Y'])

boxplot output


Note that mileage may vary depending on version:

matplotlib.__version__ pd.__version__
confirmed working 3.4.2 1.3.1
confirmed not working 3.0.1 1.2.4

Upvotes: 3

Henry Ecker
Henry Ecker

Reputation: 35646

We can use groupby boxplot to create subplots per Group and then separate each boxplot by Type:

fig, axes = plt.subplots(1, 2, figsize=(8, 6), sharey=False)
df_plots.groupby('Group').boxplot(by='Type', ax=axes)
plt.show()

Or without subplots by passing parameters directly through the function call:

axes = df_plots.groupby('Group').boxplot(by='Type', figsize=(8, 6),
                                         layout=(1, 2), sharey=False)
plt.show()

plot


Data and imports:

import pandas as pd
from matplotlib import pyplot as plt

df_plots = pd.DataFrame({
    'Group': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B'],
    'Type': ['X', 'X', 'X', 'Y', 'Y', 'Y', 'X', 'X', 'X', 'Y', 'Y', 'Y'],
    'Value': [1, 1.2, 1.4, 1.3, 1.8, 1.5, 15, 19, 18, 17, 12, 13]
})

Upvotes: 7

mozway
mozway

Reputation: 260975

Use seaborn.catplot:

import seaborn as sns
sns.catplot(data=df, kind='box', col='Group', x='Type', y='Value', hue='Type', sharey=False, height=4)

enter image description here

Upvotes: 4

Prune
Prune

Reputation: 77857

The immediate problem is that your groupby operation returns four elements (AX, AY, BX, BY), which you're trying to plot individually. You try to use ax=axs[i] ... but i runs 0-3, while you have only the two elements in your flattened structure. There is no axs[2] or axs[3], which raises the given run-time exception.

You need to resolve your referencing one way or the other.

Upvotes: 2

Related Questions