Kajette
Kajette

Reputation: 47

Display 2 boxplots of two points of time with the same reference columns in one plot

I have a DF with the column 'category' and two columns 'T1' and 'T2'. What I've done so far is to plot a boxplot with 'category' and 'T1' and a 2nd boxplot with 'category' and 'T2'. 'category' contains 9 different variables. The dataset is about n=350.

If I create a 'normal' boxplot I get a plot with 9 boxplots in it. But I get 2 plots, one for T1 and T1 I want to display to each category 2 boxplots - T1 and T2. I have no idea how to start. I already read about grouped boxplots and don't think that it is the right way.

I created an example.

import pandas as pd
import seaborn as sns
data = {'Category':  ['eins','zwei','drei', 'vier', 'fünf', 'sechs', 'sieben', 'acht', 'neun', 'eins','zwei','drei', 'vier', 'fünf', 
                      'sechs', 'sieben', 'acht', 'neun'],
        'T1': ['1', '6', '5','8', '4', '7', '5', '7', '1', '7', '3', '2', '1', '4', '7', '5', '7', '1'],
         'T2':['3', '7', '7','9', '8', '10', '8', '9', '3', '10', '9', '5', '3', '8', '9', '6', '7', '5']}

df = pd.DataFrame(data)
df.loc[:, 'T1']=df.loc[:, 'T1'].astype('int')
df.loc[:, 'T2']=df.loc[:, 'T2'].astype('int')
sns.boxplot(x = df.loc[:,'T1'],
            y = df.loc[:,'Category']);
sns.boxplot(x = df.loc[:,'T2'],
            y = df.loc[:,'Category']);

I tried also this:

f, axes = plt.subplots()
sns.boxplot(x="T1",y="Category" ,data=df, palette="Set1")#,ax=axes[0])
sns.boxplot(x="T2",y="Category" ,data=df, palette="Set3")#,ax=axes[0])
#fig.tight_layout()
plt.show()

Then I get 2 plots in one graph. But they are overlaying eacxh other. How can I display the boxplot of T2 below the T1 of the respective category?

Upvotes: 2

Views: 53

Answers (1)

seralouk
seralouk

Reputation: 33147

I would use the hue parameter in the sns.boxplot function:

import pandas as pd
import seaborn as sns
data = {'Category':  ['eins','zwei','drei', 'vier', 'fünf', 'sechs', 'sieben', 'acht', 'neun', 'eins','zwei','drei', 'vier', 'fünf', 
                      'sechs', 'sieben', 'acht', 'neun'],
        'T1': ['1', '6', '5','8', '4', '7', '5', '7', '1', '7', '3', '2', '1', '4', '7', '5', '7', '1'],
         'T2':['3', '7', '7','9', '8', '10', '8', '9', '3', '10', '9', '5', '3', '8', '9', '6', '7', '5']}

df = pd.DataFrame(data)
df.loc[:, 'T1']=df.loc[:, 'T1'].astype('int')
df.loc[:, 'T2']=df.loc[:, 'T2'].astype('int')
#sns.boxplot(x = df.loc[:,'T1'],
#            y = df.loc[:,'Category']);
#sns.boxplot(x = df.loc[:,'T2'],
#            y = df.loc[:,'Category']);

sns.boxplot(x='Category', y='value', hue='variable', data=df.melt(id_vars='Category', var_name='variable', value_name='value'))

plt.show()

Here, I used df.melt to convert the 'T1' and 'T2' columns into a single 'value' column and a 'variable' column indicating which of these two columns the value came from.

You can print the df.melt(id_vars='Category', var_name='variable', value_name='value') and see the output.

enter image description here


or

sns.boxplot(y='Category', x='value', hue='variable', data=df.melt(id_vars='Category', var_name='variable', value_name='value'))

enter image description here

Upvotes: 0

Related Questions