edesz
edesz

Reputation: 12396

Python Side-by-side box plots on same figure

I am trying to generate a box plot in Python 2.7 for each categorical value in column E from the Pandas dataframe below

          A         B         C         D  E
0  0.647366  0.317832  0.875353  0.993592  1
1  0.504790  0.041806  0.113889  0.445370  2
2  0.769335  0.120647  0.749565  0.935732  3
3  0.215003  0.497402  0.795033  0.246890  1
4  0.841577  0.211128  0.248779  0.250432  1
5  0.045797  0.710889  0.257784  0.207661  4
6  0.229536  0.094308  0.464018  0.402725  3
7  0.067887  0.591637  0.949509  0.858394  2
8  0.827660  0.348025  0.507488  0.343006  3
9  0.559795  0.820231  0.461300  0.921024  1

I would be willing to do this with Matplotlib or any other plotting library. So far the above code can plot all the categories combined on one plot. Here is the code to generate the above data and produce the plot:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()

# Data
df = pd.DataFrame(np.random.rand(10,4),columns=list('ABCD'))
df['E'] = [1,2,3,1,1,4,3,2,3,1]

# Boxplot
bp = ax.boxplot(df.iloc[:,:-1].values, widths=0.2)
plt.show()

In this example, the categories are 1,2,3,4. I would like to plot separate boxplots side-by-side on the same figure, for only categories 1 and 2 and show the category names in the legend.

Is there a way to do this?

Additional Information:

The output should look similar to the 3rd figure from here - replace "Yes","No" by "1","2".

Upvotes: 12

Views: 45209

Answers (2)

banderlog013
banderlog013

Reputation: 2505

An addition to @Paul_H answer.

Side-by-side boxplots on the single matplotlib.axes.Axes, no seaborn:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


df = pd.DataFrame(np.random.rand(10,4), columns=list('ABCD'))
df['E'] = [1, 2, 1, 1, 1, 2, 1, 2, 2, 1]

mask_e = df['E'] == 1

# prepare data
data_to_plot = [df[mask_e]['A'], df[~mask_e]['A'],
                df[mask_e]['B'], df[~mask_e]['B'],
                df[mask_e]['C'], df[~mask_e]['C'],
                df[mask_e]['D'], df[~mask_e]['D']]

# Positions defaults to range(1, N+1) where N is the number of boxplot to be drawn.
# we will move them a little, to visually group them
plt.figure(figsize=(10, 6))
box = plt.boxplot(data_to_plot,
                  positions=[1, 1.6, 2.5, 3.1, 4, 4.6, 5.5, 6.1],
                  labels=['A1','A0','B1','B0','C1','C0','D1','D0'])

result

Upvotes: 5

Paul H
Paul H

Reputation: 68116

Starting with this:

import numpy
import pandas
from matplotlib import pyplot
import seaborn
seaborn.set(style="ticks")

# Data
df = pandas.DataFrame(numpy.random.rand(10,4), columns=list('ABCD'))
df['E'] = [1, 2, 3, 1, 1, 4, 3, 2, 3, 1]

You've got a couple of options. If separate axes are ok,

fig, axes = pyplot.subplots(ncols=4, figsize=(12, 5), sharey=True)
df.query("E in [1, 2]").boxplot(by='E', return_type='axes', ax=axes)

enter image description here

If you want 1 axes, I think seaborn will be easier. You just need to clean up your data.

ax = (
    df.set_index('E', append=True)  # set E as part of the index
      .stack()                      # pull A - D into rows 
      .to_frame()                   # convert to a dataframe
      .reset_index()                # make the index into reg. columns
      .rename(columns={'level_2': 'quantity', 0: 'value'})  # rename columns
      .drop('level_0', axis='columns')   # drop junk columns
      .pipe((seaborn.boxplot, 'data'), x='E', y='value', hue='quantity', order=[1, 2])  
)
seaborn.despine(trim=True)

enter image description here

The cool thing about seaborn is that tweaking the parameters slightly can achieve a lot in terms of the plot's layout. If we switch our hue and x variables, we get:

ax = (
    df.set_index('E', append=True)  # set E as part of the index
      .stack()                      # pull A - D into rows 
      .to_frame()                   # convert to a dataframe
      .reset_index()                # make the index into reg. columns
      .rename(columns={'level_2': 'quantity', 0: 'value'})  # rename columns
      .drop('level_0', axis='columns')   # drop junk columns
      .pipe((seaborn.boxplot, 'data'), x='quantity', y='value', hue='E', hue_order=[1, 2])  
)
seaborn.despine(trim=True)

enter image description here

If you're curious, the resulting dataframe looks something like this:

    E quantity     value
0   1        A  0.935433
1   1        B  0.862290
2   1        C  0.197243
3   1        D  0.977969
4   2        A  0.675037
5   2        B  0.494440
6   2        C  0.492762
7   2        D  0.531296
8   3        A  0.119273
9   3        B  0.303639
10  3        C  0.911700
11  3        D  0.807861

Upvotes: 19

Related Questions