Reputation: 2911
I want to display two Pandas dataframes within one figure as boxplots. As each of the two dataframes has different value range, I would like to have them combined in a twinx figure.
Reduced to the minimum, I have tried the following:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(100,200,size=(100, 2)), columns=list('EF'))
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
df1.boxplot(ax=ax1)
df2.boxplot(ax=ax2)
plt.show()
The result is expectedly not what it should look like (there should be 6 boxes on the plot, actually!)
How can I manage to have the boxplots next to each other? I tried to set some dummy scatter points on ax1 and ax2, but this did not really help.
Upvotes: 1
Views: 2829
Reputation: 6194
The best solution is to concatenate the data frames for plotting and to use a mask. In the creation of the mask, we use the dfs == dfs | dfs.isnull()
to create a full matrix with True
and then we query on all column names that are not 'E'
or 'F'
. This gives a 2D matrix that allows you to only plot the first four boxes, as the last two two are masked (so their ticks do appear at the bottom). With the inverse mask ~mask
you plot the last two on their own axis and mask the first four.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame(np.random.randint( 0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(100,200,size=(100, 2)), columns=list('EF' ))
dfs = pd.concat([df1, df2])
mask = ((dfs == dfs) | dfs.isnull()) & (dfs.columns != 'E') & (dfs.columns != 'F')
fig, ax1 = plt.subplots()
dfs[mask].boxplot()
ax2 = ax1.twinx()
dfs[~mask].boxplot()
plt.show()
Upvotes: 3