user3017048
user3017048

Reputation: 2911

Combine two dataframe boxplots in a twinx figure

I want to display two Pandas dataframes within one figure as boxplots. As each of the two dataframes has different value range, I would like to have them combined in a twinx figure.

Reduced to the minimum, I have tried the following:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

df2 = pd.DataFrame(np.random.randint(100,200,size=(100, 2)), columns=list('EF'))

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()

df1.boxplot(ax=ax1)
df2.boxplot(ax=ax2)

plt.show()

The result is expectedly not what it should look like (there should be 6 boxes on the plot, actually!)

enter image description here

How can I manage to have the boxplots next to each other? I tried to set some dummy scatter points on ax1 and ax2, but this did not really help.

Upvotes: 1

Views: 2829

Answers (1)

Chiel
Chiel

Reputation: 6194

The best solution is to concatenate the data frames for plotting and to use a mask. In the creation of the mask, we use the dfs == dfs | dfs.isnull() to create a full matrix with True and then we query on all column names that are not 'E' or 'F'. This gives a 2D matrix that allows you to only plot the first four boxes, as the last two two are masked (so their ticks do appear at the bottom). With the inverse mask ~mask you plot the last two on their own axis and mask the first four.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df1 = pd.DataFrame(np.random.randint(  0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(100,200,size=(100, 2)), columns=list('EF'  ))

dfs = pd.concat([df1, df2])
mask = ((dfs == dfs) | dfs.isnull()) & (dfs.columns != 'E') & (dfs.columns != 'F')

fig, ax1 = plt.subplots()
dfs[mask].boxplot()

ax2 = ax1.twinx()
dfs[~mask].boxplot()

plt.show()

The box plots

Upvotes: 3

Related Questions