Reputation: 51
I want to scale markers on a plot of 2 categorical variables by count of observations.
I am using seaborn.pairplot
for easiness, because I have quite a lot of variables (features). But I don't think there is an argument for a case like this.
Upvotes: 2
Views: 2043
Reputation: 3630
I am guessing that what you are looking for is a balloon plot, also known as a matrix bubble chart or a categorical bubble plot. To my knowledge, seaborn does not provide this type of plot as of version 0.11.0 so using pairplot is currently not an option. I know of two functions that provide this type of plot displaying a single categorical-to-categorical relationship with a selected numerical variable for the size of the markers: this one in the pygal package and catscatter. But the downside is that both of these require that you have the count of observations as a column in your dataset, which I assume is not your case.
Here is a way to create a balloon plot displaying the count of observations grouped by two categorical variables contained in a pandas dataframe:
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import seaborn as sns # v 0.11.0
# Import seaborn sample dataset stored as a pandas dataframe and select
# the categorical variables to plot
df = sns.load_dataset('titanic')
x = 'who' # contains 3 unique values: 'child', 'man', 'woman'
y = 'embark_town' # contains 3 unique values: 'Southampton', 'Queenstown', 'Cherbourg'
# Compute the counts of observations
df_counts = df.groupby([x, y]).size().reset_index()
df_counts.columns.values[df_counts.columns == 0] = 'count'
# Compute a size variable for the markers so that they have a good size regardless
# of the total count and the number of unique values in each categorical variable
scale = 500*df_counts['count'].size
size = df_counts['count']/df_counts['count'].sum()*scale
# Create matplotlib scatter plot with additional formatting
fig, ax = plt.subplots(figsize=(8,6))
ax.scatter(x, y, size, data=df_counts, zorder=2)
ax.grid(color='grey', linestyle='--', alpha=0.4, zorder=1)
ax.tick_params(length=0)
ax.set_frame_on(False)
ax.margins(.3)
Sources of inspiration: catscatter, this answer
Upvotes: 1