Ep1c1aN
Ep1c1aN

Reputation: 733

Combine multiple box-plots in Pandas with different ranges?

I have 2 datasets, one representing Rootzone (mm) and other representing Tree cover (%). I am able to plot these datasets side by side (as shown below). The code used was:

    fig = plt.subplots(figsize = (16,7))
    ax = [
        plt.subplot(121),
        plt.subplot(122)]
    classified_data.boxplot(grid=False, rot=90, fontsize=10, ax = ax[0])
    classified_treecover.boxplot(grid=False, rot=90, fontsize=10, ax = ax[1])
    ax[0].set_ylabel('Rootzone Storage Capacity (mm)', fontsize = '12')
    ax[1].set_ylabel('Tree Cover (%)', fontsize = '12')
    ax[0].set_title('Rootzone Storage Capacity (mm)')
    ax[1].set_title('Tree Cover (%)')

enter image description here

But I want to have them in the same plot with both Rootzone (on the left-hand y-axis) and Tree cover (on the right-hand y-axis) as their range is different (using something like twinx()). But I want them to be stacked together for a single class on the x-axis (something like as shown below with a twin y-axis for the tree cover). Can someone guide me as to how this can be achieved with my code??

enter image description here

Upvotes: 3

Views: 4373

Answers (1)

Mykola Zotko
Mykola Zotko

Reputation: 17794

To plot two datasets with different ranges in the same figure you need to convert all values to corresponding z scores (standardize your data). You can use the hue parameter in the boxplot() function in seaborn to plot two datasets side by side. Consider the following example with 'mpg' dataset.

   displacement  horsepower origin
0         307.0       130.0    usa
1         350.0       165.0    usa
2         318.0       150.0    usa
3         304.0       150.0    usa
4         302.0       140.0    usa

import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('mpg')

df1 = df[['displacement', 'origin']].copy()
df2 = df[['horsepower', 'origin']].copy()

# Convert values to z scores.
df1['z_score'] = df1['displacement'].\
apply(lambda x: (x - df1['displacement'].mean()) / df1['displacement'].std())
df2['z_score'] = df2['horsepower'].\
apply(lambda x: (x - df2['horsepower'].mean()) / df2['horsepower'].std())

df1.drop(['displacement'], axis= 1, inplace=True)
df2.drop(['horsepower'], axis=1, inplace=True)

# Add extra column to use it as the 'hue' parameter.
df1['value'] = 'displacement'
df2['value'] = 'horsepower'

df_cat = pd.concat([df1, df2])

ax = sns.boxplot(x='origin', y='z_score', hue='value', data=df_cat)

plt.yticks([])
ax.set_ylabel('')

# Add the left y axis.
ax1 = ax.twinx()
ax1.set_yticks(np.linspace(df['displacement'].min(), df['displacement'].max(), 5))
ax1.spines['right'].set_position(('axes', -0.2))
ax1.set_ylabel('displacement')

# Add the right y axis.
ax2 = ax.twinx()
ax2.set_yticks(np.linspace(df['horsepower'].min(), df['horsepower'].max(), 5))
ax2.spines['right'].set_position(('axes', 1))
ax2.set_ylabel('horsepower')
plt.show()

Figure

Upvotes: 3

Related Questions