normanius
normanius

Reputation: 9822

How to overlay a scatterplot on top of boxplot with sns.catplot?

It is possible to combine axes-level plot functions by simply calling them successively:

import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
sns.set_theme(style="whitegrid")
ax = sns.boxplot(x="day", y="total_bill", data=tips)
ax = sns.stripplot(x="day", y="total_bill", data=tips, 
                   color=".25", alpha=0.7, ax=ax)
plt.show()

How to achieve this for the figure-level function sns.catplot()? Successive calls to sns.catplot() creates a new figure each time, and passing a figure handle is not possible.

# This creates two separate figures:
sns.catplot(..., kind="box")
sns.catplot(..., kind="strip")

Upvotes: 7

Views: 4379

Answers (1)

normanius
normanius

Reputation: 9822

The following works for me with seaborn v0.11:

import seaborn as sns 
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
g = sns.catplot(x="sex", y="total_bill", hue="smoker", col="time",
                data=tips, kind="box",
                palette=["#FFA7A0", "#ABEAC9"],
                height=4, aspect=.7);
g.map_dataframe(sns.stripplot, x="sex", y="total_bill", 
                hue="smoker", palette=["#404040"], 
                alpha=0.6, dodge=True)
# g.map(sns.stripplot, "sex", "total_bill", "smoker", 
#       palette=["#404040"], alpha=0.6, dodge=True)
plt.show()

Explanations: In a first pass, the box-plots are created using sns.catplot(). The function returns a sns.FacetGrid that accommodates the different axes for each value of the categorical parameter time. In a second pass, this FacetGrid is reused to overlay the scatter plot (sns.stripplot, or alternatively, sns.swarmplot). The above uses method map_dataframe() because data is a pandas DataFrame with named columns. (Alternatively, using map() is also possible.) Setting dodge=True makes sure that the scatter plots are shifted along the categorical axis for each hue category. Finally, note that by calling sns.catplot() with kind="box" and then overlaying the scatter in a second step, the problem of duplicated legend entries is implicitly circumvented.

Resulting figure

Alternative (not recommended): It is also possible to create a FacetGrid object first and then call map_dataframe() twice. While this works for this example, in other situations one has to make sure that the mapping of properties is synchronized correctly across facets (see the warning in the docs). sns.catplot() takes care of this, as well as the legend.

g = sns.FacetGrid(tips, col="time", height=4, aspect=.7)
g.map_dataframe(sns.boxplot, x="sex", y="total_bill", hue="smoker", 
                palette=["#FFA7A0", "#ABEAC9"])
g.map_dataframe(sns.stripplot, x="sex", y="total_bill", hue="smoker", 
                palette=["#404040"], alpha=0.6, dodge=True)
# Note: the default legend is not resulting in the correct entries.
#       Some fix-up step is required here...
# g.add_legend()
plt.show()

Upvotes: 12

Related Questions