Reputation: 105
How can I count the frequency of values in a column and calculate the percentage relative to the total?
I got a dataframe:
range
0 G-L
1 M-R
2 G-L
3 M-R
4 A-F
5 S-Z
6 A-F
.. ..
.. ..
after df.range.value_counts() i get this:
A-F 1882
G-L 3096
M-R 3830
S-Z 1017
now i want to get the percentage of each range in comparison to the total sum and show this in a plot, where the x-axis got the ranges(A-F; G-L;...) und the y-axis shows the percentage of these ranges.
Upvotes: 2
Views: 5959
Reputation: 1131
Assume this is your DataFrame
:
data = {'labels': ["A-F", "G-L", "M-R", "S-Z"], 'count':[1882, 3096, 3830, 1017]}
df = pd.DataFrame.from_dict(data)
print(df)
labels count
0 A-F 1882
1 G-L 3096
2 M-R 3830
3 S-Z 1017
Now you have to calculate the percentage of each row:
df['percentage'] = (df['count'] / df['count'].sum()) * 100
print(df)
labels count percentage
0 A-F 1882 19.155216
1 G-L 3096 31.511450
2 M-R 3830 38.982188
3 S-Z 1017 10.351145
and then plot the labels vs the percentage using the df.plot()
function and specifying its kind
, which I assume is a barplot.
df.plot(kind='bar', x='labels', y='percentage')
This will produce the following plot:
Edit:
The value_counts()
method returns a pd.Series()
object. To plot it you can run the following line:
df.range.value_counts(normalize=True).plot(kind='bar')
Upvotes: 4