Reputation: 530
I have a dataframe like this, all categorical values:
col1 col2
0 A x
1 A y
2 A x
3 A z
4 A z
5 A z
6 B x
7 B y
8 B x
I want to group this by "col1" and get the percentage of time I get values of "col2" in separate columns. Like:
col1 x y z
0 A 33.33 16.67 50.0
1 B 66.67 33.37 0.0
I tried pivot table that gives me only count of each values of a column but how to get it in percentage?
Thanks in advance.
Upvotes: 2
Views: 2065
Reputation: 2724
You want to make a cross-tabulation of two factors (col1
and col2
) with the frequency normalized over each row. To do this you can use pd.crosstab()
with normalize
set to index
:
>> df = pd.DataFrame({'col1': list('aaaaaabbb'), 'col2': list('xyxzzzxyx')})
>> pd.crosstab(df['col1'], df['col2'], normalize='index') * 100
col2 x y z
col1
a 33.333333 16.666667 50.0
b 66.666667 33.333333 0.0
If you want to use multiple factors, just call crosstab
with a list of factors:
>> df['col3'] = list('112231345')
>> pd.crosstab([df['col1'], df['col3']], df['col2'], normalize='index') * 100
col2 x y z
col1 col3
a 1 33.333333 33.333333 33.333333
2 50.000000 0.000000 50.000000
3 0.000000 0.000000 100.000000
b 3 100.000000 0.000000 0.000000
4 0.000000 100.000000 0.000000
5 100.000000 0.000000 0.000000
If you want to round up, just call round
:
>> round(pd.crosstab(df['col1'], df['col2'], normalize='index') * 100, 2)
col2 x y z
col1
a 33.33 16.67 50.0
b 66.67 33.33 0.0
Upvotes: 3