how to nested boxplot groupBy

Question

I have a dataset of more than 50 features that correspond to the specific movement during leg rehabilitation. I compare the group that used our rehabilitation device with the group recovering without using it. The group includes patients with 3 diagnoses and I want to compare boxplots of before (red boxplot) and after (blue boxplot) for each diagnosis. This is the snippet I was using and the output I am getting.

Control group data:

dataKONTR 
           Row        DG  DKK  ...  LOS_DCL_LB  LOS_DCL_L  LOS_DCL_LF
0    Williams1    distorze  0.0  ...          63         57          78
1    Williams2    distorze  0.0  ...          91         68          67
2    Norton1           LCA  1.0  ...          58         90          64
3    Norton2           LCA  1.0  ...          29         91          87
4    Chavender1   distorze  1.0  ...          61         56          75
5    Chavender2   distorze  1.0  ...          54         74          80
6    Bendis1      distorze  1.0  ...          32         57          97
7    Bendis2      distorze  1.0  ...          55         69          79
8    Shawn1             AS  1.0  ...          15         74          75
9    Shawn2             AS  1.0  ...          67         86          79
10   Cichy1            LCA  0.0  ...          45         83          80

This is the snippet I was using and the output I am getting.

temp = "c:/Users/novos/ŠKOLA/Statistika/data Mariana/%s.xlsx"

dataKU = pd.read_excel(temp % "VestlabEXP_KU", engine = "openpyxl", skipfooter= 85)     # patients using our rehabilitation tool
dataKONTR = pd.read_excel(temp % "VestlabEXP_kontr", engine = "openpyxl", skipfooter=51)    # control group

dataKU_diag = dataKU.dropna()
dataKONTR_diag = dataKONTR.dropna()


dataKUBefore = dataKU_diag[dataKU_diag['Row'].str.contains("1")]        # Patients data ending with 1 are before rehab
dataKUAfter = dataKU_diag[dataKU_diag['Row'].str.contains("2")]         # Patients data ending with 2 are before rehab

dataKONTRBefore = dataKONTR_diagL[dataKONTR_diag['Row'].str.contains("1")]  
dataKONTRAfter = dataKONTR_diagL[dataKONTR_diag['Row'].str.contains("2")]

b1 = dataKUBefore.boxplot(column=list(dataKUBefore.filter(regex='LOS_RT')), by="DG", rot = 45, color=dict(boxes='r', whiskers='r', medians='r', caps='r'),layout=(2,4),return_type='axes')


plt.ylim(0.5, 1.5)
plt.suptitle("")
plt.suptitle("Before, KU")

b2 = dataKUAfter.boxplot(column=list(dataKUAfter.filter(regex='LOS_RT')), by="DG", rot = 45, color=dict(boxes='b', whiskers='b', medians='b', caps='b'),layout=(2,4),return_type='axes')
# dataKUPredP
plt.suptitle("")
plt.suptitle("After, KU")
plt.ylim(0.5, 1.5)
plt.show()

Output is in two figures (red boxplot is all the "before rehab" data and blue boxplot is all the "after rehab")

Can you help me how make the red and blue boxplots next to each other for each diagnosis?

Thank you for any help.

Cimbali · Accepted Answer

You can try to concatenate the data into a single dataframe:

dataKUPlot = pd.concat({
    'Before': dataKUBefore,
    'After': dataKUAfter,
}, names=['When'])

You should see an additional index level named When in the output. Using the example data you posted it looks like this:

>>> pd.concat({'Before': df, 'After': df}, names=['When'])
                 Row        DG  DKK  ...  LOS_DCL_LB  LOS_DCL_L  LOS_DCL_LF
When                                                                       
Before 0   Williams1  distorze  0.0  ...          63         57          78
       1   Williams2  distorze  0.0  ...          91         68          67
       2     Norton1       LCA  1.0  ...          58         90          64
       3     Norton2       LCA  1.0  ...          29         91          87
       4  Chavender1  distorze  1.0  ...          61         56          75
After  0   Williams1  distorze  0.0  ...          63         57          78
       1   Williams2  distorze  0.0  ...          91         68          67
       2     Norton1       LCA  1.0  ...          58         90          64
       3     Norton2       LCA  1.0  ...          29         91          87
       4  Chavender1  distorze  1.0  ...          61         56          75

Then you can plot all of the boxes with a single command and thus on the same plots, by modifying the by grouper:

dataKUAfter.boxplot(column=dataKUPlot.filter(regex='LOS_RT').columns.to_list(), by=['DG', 'When'], rot = 45, layout=(2,4), return_type='axes')

I believe that’s the only “simple” way, I’m afraid that looks a little confused:

Any other way implies manual plotting with matplotlib − and thus better control. For example iterate on all desired columns:

fig, axes = plt.subplots(nrows=2, ncols=3, sharey=True, sharex=True)
pos = 1 + np.arange(max(dataKUBefore['DG'].nunique(), dataKUAfter['DG'].nunique()))
redboxes = {f'{x}props': dict(color='r') for x in ['box', 'whisker', 'median', 'cap']}
blueboxes = {f'{x}props': dict(color='b') for x in ['box', 'whisker', 'median', 'cap']}

ax_it = axes.flat
for colname, ax in zip(dataKUBefore.filter(regex='LOS_RT').columns, ax_it):
    # Making a dataframe here to ensure the same ordering
    show = pd.DataFrame({
        'before': dataKUBefore[colname].groupby(dataKUBefore['DG']).agg(list),
        'after': dataKUAfter[colname].groupby(dataKUAfter['DG']).agg(list),
    })

    ax.boxplot(show['before'].values, positions=pos - .15, **redboxes)
    ax.boxplot(show['after'].values, positions=pos + .15, **blueboxes)

    ax.set_xticks(pos)
    ax.set_xticklabels(show.index, rotation=45) 
    ax.set_title(colname)
    ax.grid(axis='both')

# Hide remaining axes:
for ax in ax_it:
    ax.axis('off')

how to nested boxplot groupBy

Answers (2)

Related Questions