Reputation: 19
I have a dataset of more than 50 features that correspond to the specific movement during leg rehabilitation. I compare the group that used our rehabilitation device with the group recovering without using it. The group includes patients with 3 diagnoses and I want to compare boxplots of before (red boxplot) and after (blue boxplot) for each diagnosis. This is the snippet I was using and the output I am getting.
Control group data:
dataKONTR
Row DG DKK ... LOS_DCL_LB LOS_DCL_L LOS_DCL_LF
0 Williams1 distorze 0.0 ... 63 57 78
1 Williams2 distorze 0.0 ... 91 68 67
2 Norton1 LCA 1.0 ... 58 90 64
3 Norton2 LCA 1.0 ... 29 91 87
4 Chavender1 distorze 1.0 ... 61 56 75
5 Chavender2 distorze 1.0 ... 54 74 80
6 Bendis1 distorze 1.0 ... 32 57 97
7 Bendis2 distorze 1.0 ... 55 69 79
8 Shawn1 AS 1.0 ... 15 74 75
9 Shawn2 AS 1.0 ... 67 86 79
10 Cichy1 LCA 0.0 ... 45 83 80
This is the snippet I was using and the output I am getting.
temp = "c:/Users/novos/ŠKOLA/Statistika/data Mariana/%s.xlsx"
dataKU = pd.read_excel(temp % "VestlabEXP_KU", engine = "openpyxl", skipfooter= 85) # patients using our rehabilitation tool
dataKONTR = pd.read_excel(temp % "VestlabEXP_kontr", engine = "openpyxl", skipfooter=51) # control group
dataKU_diag = dataKU.dropna()
dataKONTR_diag = dataKONTR.dropna()
dataKUBefore = dataKU_diag[dataKU_diag['Row'].str.contains("1")] # Patients data ending with 1 are before rehab
dataKUAfter = dataKU_diag[dataKU_diag['Row'].str.contains("2")] # Patients data ending with 2 are before rehab
dataKONTRBefore = dataKONTR_diagL[dataKONTR_diag['Row'].str.contains("1")]
dataKONTRAfter = dataKONTR_diagL[dataKONTR_diag['Row'].str.contains("2")]
b1 = dataKUBefore.boxplot(column=list(dataKUBefore.filter(regex='LOS_RT')), by="DG", rot = 45, color=dict(boxes='r', whiskers='r', medians='r', caps='r'),layout=(2,4),return_type='axes')
plt.ylim(0.5, 1.5)
plt.suptitle("")
plt.suptitle("Before, KU")
b2 = dataKUAfter.boxplot(column=list(dataKUAfter.filter(regex='LOS_RT')), by="DG", rot = 45, color=dict(boxes='b', whiskers='b', medians='b', caps='b'),layout=(2,4),return_type='axes')
# dataKUPredP
plt.suptitle("")
plt.suptitle("After, KU")
plt.ylim(0.5, 1.5)
plt.show()
Output is in two figures (red boxplot is all the "before rehab" data and blue boxplot is all the "after rehab")
Can you help me how make the red and blue boxplots next to each other for each diagnosis?
Thank you for any help.
Upvotes: 1
Views: 369
Reputation: 11395
You can try to concatenate the data into a single dataframe:
dataKUPlot = pd.concat({
'Before': dataKUBefore,
'After': dataKUAfter,
}, names=['When'])
You should see an additional index level named When
in the output.
Using the example data you posted it looks like this:
>>> pd.concat({'Before': df, 'After': df}, names=['When'])
Row DG DKK ... LOS_DCL_LB LOS_DCL_L LOS_DCL_LF
When
Before 0 Williams1 distorze 0.0 ... 63 57 78
1 Williams2 distorze 0.0 ... 91 68 67
2 Norton1 LCA 1.0 ... 58 90 64
3 Norton2 LCA 1.0 ... 29 91 87
4 Chavender1 distorze 1.0 ... 61 56 75
After 0 Williams1 distorze 0.0 ... 63 57 78
1 Williams2 distorze 0.0 ... 91 68 67
2 Norton1 LCA 1.0 ... 58 90 64
3 Norton2 LCA 1.0 ... 29 91 87
4 Chavender1 distorze 1.0 ... 61 56 75
Then you can plot all of the boxes with a single command and thus on the same plots, by modifying the by
grouper:
dataKUAfter.boxplot(column=dataKUPlot.filter(regex='LOS_RT').columns.to_list(), by=['DG', 'When'], rot = 45, layout=(2,4), return_type='axes')
I believe that’s the only “simple” way, I’m afraid that looks a little confused:
Any other way implies manual plotting with matplotlib − and thus better control. For example iterate on all desired columns:
fig, axes = plt.subplots(nrows=2, ncols=3, sharey=True, sharex=True)
pos = 1 + np.arange(max(dataKUBefore['DG'].nunique(), dataKUAfter['DG'].nunique()))
redboxes = {f'{x}props': dict(color='r') for x in ['box', 'whisker', 'median', 'cap']}
blueboxes = {f'{x}props': dict(color='b') for x in ['box', 'whisker', 'median', 'cap']}
ax_it = axes.flat
for colname, ax in zip(dataKUBefore.filter(regex='LOS_RT').columns, ax_it):
# Making a dataframe here to ensure the same ordering
show = pd.DataFrame({
'before': dataKUBefore[colname].groupby(dataKUBefore['DG']).agg(list),
'after': dataKUAfter[colname].groupby(dataKUAfter['DG']).agg(list),
})
ax.boxplot(show['before'].values, positions=pos - .15, **redboxes)
ax.boxplot(show['after'].values, positions=pos + .15, **blueboxes)
ax.set_xticks(pos)
ax.set_xticklabels(show.index, rotation=45)
ax.set_title(colname)
ax.grid(axis='both')
# Hide remaining axes:
for ax in ax_it:
ax.axis('off')
Upvotes: 2
Reputation: 80329
You could add a new column to separate 'Before' and 'After'. Seaborn's boxplot
s can use that new column as hue
. sns.catplot(kind='box', ...)
creates a grid of boxplot
s:
import seaborn as sns
import pandas as pd
import numpy as np
names = ['Adams', 'Arthur', 'Buchanan', 'Buren', 'Bush', 'Carter', 'Cleveland', 'Clinton', 'Coolidge', 'Eisenhower', 'Fillmore', 'Ford', 'Garfield', 'Grant', 'Harding', 'Harrison', 'Hayes', 'Hoover', 'Jackson', 'Jefferson', 'Johnson', 'Kennedy', 'Lincoln', 'Madison', 'McKinley', 'Monroe', 'Nixon', 'Obama', 'Pierce', 'Polk', 'Reagan', 'Roosevelt', 'Taft', 'Taylor', 'Truman', 'Trump', 'Tyler', 'Washington', 'Wilson']
rows = np.array([(name + '1', name + '2') for name in names]).flatten()
dataKONTR = pd.DataFrame({'Row': rows,
'DG': np.random.choice(['AS', 'Distorze', 'LCA'], len(rows)),
'LOS_RT_A': np.random.randint(15, 100, len(rows)),
'LOS_RT_B': np.random.randint(15, 100, len(rows)),
'LOS_RT_C': np.random.randint(15, 100, len(rows)),
'LOS_RT_D': np.random.randint(15, 100, len(rows)),
'LOS_RT_E': np.random.randint(15, 100, len(rows)),
'LOS_RT_F': np.random.randint(15, 100, len(rows))})
dataKONTR = dataKONTR.dropna()
dataKONTR['When'] = ['Before' if r[-1] == '1' else 'After' for r in dataKONTR['Row']]
cols = [c for c in dataKONTR.columns if 'LOS_RT' in c]
df_long = dataKONTR.melt(value_vars=cols, var_name='Which', value_name='Value', id_vars=['When', 'DG'])
g = sns.catplot(kind='box', data=df_long, x='DG', col='Which', col_wrap=3, y='Value', hue='When')
g.set_axis_labels('', '') # remove the x and y labels
Upvotes: 1