How do I plot my histogram for density rather than count? (Matplotlib)

Question

I have a data frame called 'train' with a column 'string' and a column 'string length' and a column 'rank' which has ranking ranging from 0-4.

I want to create a histogram of the string length for each ranking and plot all of the histograms on one graph to compare. I am experiencing two issues with this:

The only way I can manage to do this is by creating separate datasets e.g. with the following type of code:

S0 = train.loc[train['rank'] == 0]
S1 = train.loc[train['rank'] == 1]

Then I create individual histograms for each dataset using:

plt.hist(train['string length'], bins = 100)
plt.show()

This code doesn't plot the density but instead plots the counts. How do I alter my code such that it plots density instead?

Is there also a way to do this without having to create separate datasets? I was told that my method is 'unpythonic'

BCJuan · Accepted Answer

You could do something like:

df.loc[:, df.columns != 'string'].groupby('rank').hist(density=True, bins =10, figsize=(5,5))

Basically, what it does is select all columns except string, group them by rank and make an histogram of all them following the arguments.

The density argument set to density=True draws it in a normalized manner, as

Hope this has helped.

EDIT:

f there are more variables and you want the histograms overlapped, try:

df.groupby('rank')['string length'].hist(density=True, histtype='step', bins =10,figsize=(5,5))

How do I plot my histogram for density rather than count? (Matplotlib)

Answers (1)

Related Questions