Reputation: 865
I have the following data frame:
bin_class = [0,1,1,1,0,0,0,0,1,1,0,0,0,0,0,0,0,1]
teams = ['A','B','B','A','A','B','B','A','A','B','B','A','A','B','B','A','B','B']
d = {'Team':teams,'Classification':bin_class}
df = pd.DataFrame(d)
Team Classification
0 A 0
1 B 1
2 B 1
3 A 1
4 A 0
5 B 0
6 B 0
7 A 0
8 A 1
9 B 1
10 B 0
11 A 0
12 A 0
13 B 0
14 B 0
15 A 0
16 B 0
17 B 1
I need to figure out the percentages of each bin_class for each team. i.e. of all rows on team A, what % were 0 and what % were 1? I've tried a few different methods that have all failed and been overly complicated, is there a simple way to do this?
Upvotes: 2
Views: 859
Reputation: 30920
ones=df.groupby(['Team']).sum()
long=df.groupby(['Team']).count()
percentages_ones=(ones/long)*100
percentages_zeros=((long-ones)/long)*100
percentages_ones.rename(columns=lambda x: x.replace('Classification', 'Percentage of ones'), inplace=True)
percentages_zeros.rename(columns=lambda x: x.replace('Classification', 'Percentages of zeros'), inplace=True)
percentages=pd.concat([percentages_zeros,percentages_ones],axis=1)
percentages
Output:
Percentages of zeros Percentage of ones
Team
A 75.0 25.0
B 60.0 40.0
Upvotes: 0
Reputation: 858
You can do groupby
on columns Team and Classification and then calculate percentage:
group_count = df.groupby(['Team', 'Classification']).agg(np.size)
Output:
Team Classification
A 0 12
1 4
B 0 12
1 8
group_percentage = group_count.groupby(level=0).apply(lambda x: 100 * x / float(x.sum()))
Output:
Team Classification
A 0 75.0
1 25.0
B 0 60.0
1 40.0
Upvotes: 1
Reputation: 476659
The percentage of ones is just the average of the Classification
, since there is only 0
and 1
here:
>>> df.groupby('Team').mean()
Classification
Team
A 0.25
B 0.40
Note that this will not work in case the Classification
column has values other than 0
and 1
.
Upvotes: 2
Reputation: 323236
Using crosstab
pd.crosstab(df.Team,df.Classification,normalize='index')
Out[498]:
Classification 0 1
Team
A 0.75 0.25
B 0.60 0.40
Upvotes: 6