deltascience
deltascience

Reputation: 3380

seaborn countplot doesn't show all categories

I am trying to plot subcategories in stacked bars using countplot. The problem I have is that stacked bars doesn't show all the categories

import seaborn as sns
from matplotlib import pyplot
flatui = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]

fig, ax = pyplot.subplots(figsize=(20,15))
g = sns.countplot(ax=ax,
            y="LV1",
            hue="LV2",
            palette=sns.color_palette("hls", 8) + sns.color_palette("Paired") + sns.color_palette(flatui),
            dodge=False,
            data=plot_data);
g.legend(loc='center left', bbox_to_anchor=(1, 0.6), ncol=3)

Description the dataframe content:

LV1 is a column that contains the upper category and LV2 is the subcategory. An example here you can see that R has only two subcategories but it's not the case it has 21 in which top 1 has 20 occurrences and top 2 and 3 have each 9 occurrences. enter image description here

Upvotes: 2

Views: 2431

Answers (1)

Parfait
Parfait

Reputation: 107567

Likely bars are overlapping each other and larger one superimposes all the other smaller ones since you call dodge=False. If you limit plot_data data to just R category with dodge=True, all categories should then be present. Since count plots do not stack, consider a stacked bar graph instead of count plot which is more or less synonymous to a histogram of categorical variable.

To demonstrate, see the following, reproducible example:

Data

import numpy as np
import pandas as pd

from  matplotlib import pyplot
import seaborn as sns

### DATA BUILD
data_tools = ['sas', 'stata', 'spss', 'python', 'r', 'julia']
np.random.seed(12220)
random_df = pd.DataFrame({'group': np.random.choice(data_tools, 500),
                          'int': np.random.randint(1, 10, 500)})

First Plot (see how only the large 'stata' bar at int=6 shows)

flatui = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]

fig, ax = pyplot.subplots(figsize=(8,4))

g = sns.countplot(ax=ax,
                  y="int",
                  hue="group",
                  palette=(sns.color_palette("hls", 8) + 
                           sns.color_palette("Paired") + 
                           sns.color_palette(flatui)),
                  dodge=False,
                  data=random_df)

g.legend(loc='upper center', ncol=3)

First Plot

Second Plot (shows other categories beyond only the largest bar at 'stata' for int=6)

fig, ax = pyplot.subplots(figsize=(8,4))

g = sns.countplot(ax=ax,
                  y="int",
                  hue="group",
                  palette=(sns.color_palette("hls", 8) + 
                           sns.color_palette("Paired") + 
                           sns.color_palette(flatui)),
                  dodge=True,                              # CHANGING DODGE PARAM
                  data=random_df.query("int==6"))          # FILTERING DATA

g.legend(loc='upper center',ncol=3)

Second Plot

Upvotes: 3

Related Questions