pandas fill in 0 for non-existing categories in value_counts()

Question

problem: I'm grouping results in my DataFrame, look at value_counts(normalize=True) and try to plot the result in a barplot.

The problem is that the barplot should contain frequencies. In some groups, some values don't occur. In that case, the corresponding value_count is not 0, it doesn't exist. For the barplot, this 0 value is not taken into account and the resulting bar is too big.

example: Here is a minimal example, which illustrates the problem: Let's say the DataFrame contains observations for experiments. When you perform such an experiment, a series of observations is collected. The result of the experiment are the relative frequencies of the observations collected for it.

df = pd.DataFrame()

df["id"] = [1]*3 + [2]*3 + [3]*3
df["experiment"] = ["a"]*6 + ["b"] * 3
df["observation"] = ["positive"]*3 + ["positive"]*2 + ["negative"]*1 + ["positive"]*2 + ["negative"]*1

there are two experiment types, "a" and "b"
observations that belong to the same evaluation of an experiment are given the same id.

So here, experiment a has been done 2 times, experiment b just once.

I need to group by id and experiment, then average the result.

plot_frame = pd.DataFrame(df.groupby(["id", "experiment"])["observation"].value_counts(normalize=True))
plot_frame = plot_frame.rename(columns={"observation":"percentage"})

In the picture above, you can already see the problem. The evaluation with id 1 has seen only positive observations. The relative frequency of "negative" should be 0. Instead, it doesn't exist. If I plot this, the corresponding bar is too high, the blue bars should add up to one:

sns.barplot(data=plot_frame.reset_index(), 
            x="observation", 
            hue="experiment", 
            y="percentage")

plt.show()

mbh86 · Accepted Answer

You can add rows filled with 0 by using unstack/stack method with argument fill_value=0. Try this:

df.groupby(["id", "experiment"])["observation"].value_counts(normalize=True).unstack(fill_value=0).stack()

pandas fill in 0 for non-existing categories in value_counts()

Answers (2)

Related Questions