Reputation: 31
I have a data frame called 'train' with a column 'string' and a column 'string length' and a column 'rank' which has ranking ranging from 0-4.
I want to create a histogram of the string length for each ranking and plot all of the histograms on one graph to compare. I am experiencing two issues with this:
The only way I can manage to do this is by creating separate datasets e.g. with the following type of code:
S0 = train.loc[train['rank'] == 0]
S1 = train.loc[train['rank'] == 1]
Then I create individual histograms for each dataset using:
plt.hist(train['string length'], bins = 100)
plt.show()
This code doesn't plot the density but instead plots the counts. How do I alter my code such that it plots density instead?
Is there also a way to do this without having to create separate datasets? I was told that my method is 'unpythonic'
Upvotes: 0
Views: 2219
Reputation: 825
You could do something like:
df.loc[:, df.columns != 'string'].groupby('rank').hist(density=True, bins =10, figsize=(5,5))
Basically, what it does is select all columns except string
, group them by rank
and make an histogram of all them following the arguments.
The density argument set to density=True
draws it in a normalized manner, as
Hope this has helped.
EDIT:
f there are more variables and you want the histograms overlapped, try:
df.groupby('rank')['string length'].hist(density=True, histtype='step', bins =10,figsize=(5,5))
Upvotes: 0