Reputation: 607
I have a dataframe that contains a column value of 'A','B','C','D'... This is just a grouping of some sorts. I wanted to produce a histogram with the column values vs its count.
import seaborn as sns
sns.distplot(dfGroupingWithoutNan['patient_group'])
This produced an error:
TypeError: unsupported operand type(s) for /: 'str' and 'int'
I thought maybe because im not familiar with distplot, i'm not using it the right way. I was thinking, i can just pass a Series into it and it will be able to determine the counts for each values and display them in the histogram accordingly.
Anyway, i thought of other solution and this is what I came up with.
series1 = dfGroupingWithoutNan['patient_group'].value_counts()
dfPatientGroup = pd.DataFrame( {'levels' : series1.index, 'level_values' : series1.values})
sns.set_style("whitegrid")
sns.barplot(x="levels", y="level_values", data=dfPatientGroup)
This time I was able to produce a plot of each values versus its count though using a bar plot.
I just wanted to ask, was there any other way to do this, like how it would have worked if i use the distplot? Also, do i really need to create a new dataframe just to have some sort of repository that holds the values and the count? I was thinking, wont it be possible for the distplot to determine the counts automatically without going through the hassle of creating a new dataframe?
Upvotes: 1
Views: 2550
Reputation: 1673
I would use a Counter
to do this. The logic is very similar to what you are doing, but you don't need to create an extra dataframe:
from collections import Counter
cnt = Counter(dfGroupingWithoutNan.patient_group)
sns.barplot(x=cnt.keys(), y=cnt.values())
I'm not aware of any solution that automatically handle string values in seaborn
or matplotlib
histograms.
Upvotes: 2