Reputation: 3380
I am trying to plot subcategories in stacked bars using countplot. The problem I have is that stacked bars doesn't show all the categories
import seaborn as sns
from matplotlib import pyplot
flatui = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
fig, ax = pyplot.subplots(figsize=(20,15))
g = sns.countplot(ax=ax,
y="LV1",
hue="LV2",
palette=sns.color_palette("hls", 8) + sns.color_palette("Paired") + sns.color_palette(flatui),
dodge=False,
data=plot_data);
g.legend(loc='center left', bbox_to_anchor=(1, 0.6), ncol=3)
Description the dataframe content:
LV1 is a column that contains the upper category and LV2 is the subcategory. An example here you can see that R has only two subcategories but it's not the case it has 21 in which top 1 has 20 occurrences and top 2 and 3 have each 9 occurrences.
Upvotes: 2
Views: 2431
Reputation: 107567
Likely bars are overlapping each other and larger one superimposes all the other smaller ones since you call dodge=False
. If you limit plot_data
data to just R
category with dodge=True
, all categories should then be present. Since count plots do not stack, consider a stacked bar graph instead of count plot which is more or less synonymous to a histogram of categorical variable.
To demonstrate, see the following, reproducible example:
Data
import numpy as np
import pandas as pd
from matplotlib import pyplot
import seaborn as sns
### DATA BUILD
data_tools = ['sas', 'stata', 'spss', 'python', 'r', 'julia']
np.random.seed(12220)
random_df = pd.DataFrame({'group': np.random.choice(data_tools, 500),
'int': np.random.randint(1, 10, 500)})
First Plot (see how only the large 'stata' bar at int=6 shows)
flatui = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
fig, ax = pyplot.subplots(figsize=(8,4))
g = sns.countplot(ax=ax,
y="int",
hue="group",
palette=(sns.color_palette("hls", 8) +
sns.color_palette("Paired") +
sns.color_palette(flatui)),
dodge=False,
data=random_df)
g.legend(loc='upper center', ncol=3)
Second Plot (shows other categories beyond only the largest bar at 'stata' for int=6)
fig, ax = pyplot.subplots(figsize=(8,4))
g = sns.countplot(ax=ax,
y="int",
hue="group",
palette=(sns.color_palette("hls", 8) +
sns.color_palette("Paired") +
sns.color_palette(flatui)),
dodge=True, # CHANGING DODGE PARAM
data=random_df.query("int==6")) # FILTERING DATA
g.legend(loc='upper center',ncol=3)
Upvotes: 3