Boris AllAboutAnyth
Boris AllAboutAnyth

Reputation: 19

how to nested boxplot groupBy

I have a dataset of more than 50 features that correspond to the specific movement during leg rehabilitation. I compare the group that used our rehabilitation device with the group recovering without using it. The group includes patients with 3 diagnoses and I want to compare boxplots of before (red boxplot) and after (blue boxplot) for each diagnosis. This is the snippet I was using and the output I am getting.

Control group data:

dataKONTR 
           Row        DG  DKK  ...  LOS_DCL_LB  LOS_DCL_L  LOS_DCL_LF
0    Williams1    distorze  0.0  ...          63         57          78
1    Williams2    distorze  0.0  ...          91         68          67
2    Norton1           LCA  1.0  ...          58         90          64
3    Norton2           LCA  1.0  ...          29         91          87
4    Chavender1   distorze  1.0  ...          61         56          75
5    Chavender2   distorze  1.0  ...          54         74          80
6    Bendis1      distorze  1.0  ...          32         57          97
7    Bendis2      distorze  1.0  ...          55         69          79
8    Shawn1             AS  1.0  ...          15         74          75
9    Shawn2             AS  1.0  ...          67         86          79
10   Cichy1            LCA  0.0  ...          45         83          80

This is the snippet I was using and the output I am getting.

temp = "c:/Users/novos/ŠKOLA/Statistika/data Mariana/%s.xlsx"

dataKU = pd.read_excel(temp % "VestlabEXP_KU", engine = "openpyxl", skipfooter= 85)     # patients using our rehabilitation tool
dataKONTR = pd.read_excel(temp % "VestlabEXP_kontr", engine = "openpyxl", skipfooter=51)    # control group

dataKU_diag = dataKU.dropna()
dataKONTR_diag = dataKONTR.dropna()


dataKUBefore = dataKU_diag[dataKU_diag['Row'].str.contains("1")]        # Patients data ending with 1 are before rehab
dataKUAfter = dataKU_diag[dataKU_diag['Row'].str.contains("2")]         # Patients data ending with 2 are before rehab

dataKONTRBefore = dataKONTR_diagL[dataKONTR_diag['Row'].str.contains("1")]  
dataKONTRAfter = dataKONTR_diagL[dataKONTR_diag['Row'].str.contains("2")]

b1 = dataKUBefore.boxplot(column=list(dataKUBefore.filter(regex='LOS_RT')), by="DG", rot = 45, color=dict(boxes='r', whiskers='r', medians='r', caps='r'),layout=(2,4),return_type='axes')


plt.ylim(0.5, 1.5)
plt.suptitle("")
plt.suptitle("Before, KU")

b2 = dataKUAfter.boxplot(column=list(dataKUAfter.filter(regex='LOS_RT')), by="DG", rot = 45, color=dict(boxes='b', whiskers='b', medians='b', caps='b'),layout=(2,4),return_type='axes')
# dataKUPredP
plt.suptitle("")
plt.suptitle("After, KU")
plt.ylim(0.5, 1.5)
plt.show()

Output is in two figures (red boxplot is all the "before rehab" data and blue boxplot is all the "after rehab")

Before rehabilitation After rehabilitation

Can you help me how make the red and blue boxplots next to each other for each diagnosis?

Thank you for any help.

Upvotes: 1

Views: 369

Answers (2)

Cimbali
Cimbali

Reputation: 11395

You can try to concatenate the data into a single dataframe:

dataKUPlot = pd.concat({
    'Before': dataKUBefore,
    'After': dataKUAfter,
}, names=['When'])

You should see an additional index level named When in the output. Using the example data you posted it looks like this:

>>> pd.concat({'Before': df, 'After': df}, names=['When'])
                 Row        DG  DKK  ...  LOS_DCL_LB  LOS_DCL_L  LOS_DCL_LF
When                                                                       
Before 0   Williams1  distorze  0.0  ...          63         57          78
       1   Williams2  distorze  0.0  ...          91         68          67
       2     Norton1       LCA  1.0  ...          58         90          64
       3     Norton2       LCA  1.0  ...          29         91          87
       4  Chavender1  distorze  1.0  ...          61         56          75
After  0   Williams1  distorze  0.0  ...          63         57          78
       1   Williams2  distorze  0.0  ...          91         68          67
       2     Norton1       LCA  1.0  ...          58         90          64
       3     Norton2       LCA  1.0  ...          29         91          87
       4  Chavender1  distorze  1.0  ...          61         56          75

Then you can plot all of the boxes with a single command and thus on the same plots, by modifying the by grouper:

dataKUAfter.boxplot(column=dataKUPlot.filter(regex='LOS_RT').columns.to_list(), by=['DG', 'When'], rot = 45, layout=(2,4), return_type='axes')

I believe that’s the only “simple” way, I’m afraid that looks a little confused:

the pandas way

Any other way implies manual plotting with matplotlib − and thus better control. For example iterate on all desired columns:

fig, axes = plt.subplots(nrows=2, ncols=3, sharey=True, sharex=True)
pos = 1 + np.arange(max(dataKUBefore['DG'].nunique(), dataKUAfter['DG'].nunique()))
redboxes = {f'{x}props': dict(color='r') for x in ['box', 'whisker', 'median', 'cap']}
blueboxes = {f'{x}props': dict(color='b') for x in ['box', 'whisker', 'median', 'cap']}

ax_it = axes.flat
for colname, ax in zip(dataKUBefore.filter(regex='LOS_RT').columns, ax_it):
    # Making a dataframe here to ensure the same ordering
    show = pd.DataFrame({
        'before': dataKUBefore[colname].groupby(dataKUBefore['DG']).agg(list),
        'after': dataKUAfter[colname].groupby(dataKUAfter['DG']).agg(list),
    })

    ax.boxplot(show['before'].values, positions=pos - .15, **redboxes)
    ax.boxplot(show['after'].values, positions=pos + .15, **blueboxes)

    ax.set_xticks(pos)
    ax.set_xticklabels(show.index, rotation=45) 
    ax.set_title(colname)
    ax.grid(axis='both')

# Hide remaining axes:
for ax in ax_it:
    ax.axis('off')

enter image description here

Upvotes: 2

JohanC
JohanC

Reputation: 80329

You could add a new column to separate 'Before' and 'After'. Seaborn's boxplots can use that new column as hue. sns.catplot(kind='box', ...) creates a grid of boxplots:

import seaborn as sns
import pandas as pd
import numpy as np

names = ['Adams', 'Arthur', 'Buchanan', 'Buren', 'Bush', 'Carter', 'Cleveland', 'Clinton', 'Coolidge', 'Eisenhower', 'Fillmore', 'Ford', 'Garfield', 'Grant', 'Harding', 'Harrison', 'Hayes', 'Hoover', 'Jackson', 'Jefferson', 'Johnson', 'Kennedy', 'Lincoln', 'Madison', 'McKinley', 'Monroe', 'Nixon', 'Obama', 'Pierce', 'Polk', 'Reagan', 'Roosevelt', 'Taft', 'Taylor', 'Truman', 'Trump', 'Tyler', 'Washington', 'Wilson']
rows = np.array([(name + '1', name + '2') for name in names]).flatten()
dataKONTR = pd.DataFrame({'Row': rows,
                          'DG': np.random.choice(['AS', 'Distorze', 'LCA'], len(rows)),
                          'LOS_RT_A': np.random.randint(15, 100, len(rows)),
                          'LOS_RT_B': np.random.randint(15, 100, len(rows)),
                          'LOS_RT_C': np.random.randint(15, 100, len(rows)),
                          'LOS_RT_D': np.random.randint(15, 100, len(rows)),
                          'LOS_RT_E': np.random.randint(15, 100, len(rows)),
                          'LOS_RT_F': np.random.randint(15, 100, len(rows))})
dataKONTR = dataKONTR.dropna()
dataKONTR['When'] = ['Before' if r[-1] == '1' else 'After' for r in dataKONTR['Row']]
cols = [c for c in dataKONTR.columns if 'LOS_RT' in c]

df_long = dataKONTR.melt(value_vars=cols, var_name='Which', value_name='Value', id_vars=['When', 'DG'])
g = sns.catplot(kind='box', data=df_long, x='DG', col='Which', col_wrap=3, y='Value', hue='When')
g.set_axis_labels('', '') # remove the x and y labels

grid of grouped boxplots

Upvotes: 1

Related Questions