carousallie
carousallie

Reputation: 865

Calculate % Unique Value Grouped by Unique Value of Other Column

I have the following data frame:

bin_class = [0,1,1,1,0,0,0,0,1,1,0,0,0,0,0,0,0,1]
teams = ['A','B','B','A','A','B','B','A','A','B','B','A','A','B','B','A','B','B']
d = {'Team':teams,'Classification':bin_class}
df = pd.DataFrame(d)

Team    Classification
0   A   0
1   B   1
2   B   1
3   A   1
4   A   0
5   B   0
6   B   0
7   A   0
8   A   1
9   B   1
10  B   0
11  A   0
12  A   0
13  B   0
14  B   0
15  A   0
16  B   0
17  B   1

I need to figure out the percentages of each bin_class for each team. i.e. of all rows on team A, what % were 0 and what % were 1? I've tried a few different methods that have all failed and been overly complicated, is there a simple way to do this?

Upvotes: 2

Views: 859

Answers (4)

ansev
ansev

Reputation: 30920

ones=df.groupby(['Team']).sum()
long=df.groupby(['Team']).count()
percentages_ones=(ones/long)*100
percentages_zeros=((long-ones)/long)*100
percentages_ones.rename(columns=lambda x: x.replace('Classification', 'Percentage of ones'), inplace=True)
percentages_zeros.rename(columns=lambda x: x.replace('Classification', 'Percentages of zeros'), inplace=True)
percentages=pd.concat([percentages_zeros,percentages_ones],axis=1)
percentages

Output:

     Percentages of zeros   Percentage of ones
Team        
A    75.0                   25.0
B    60.0                   40.0

Upvotes: 0

stahamtan
stahamtan

Reputation: 858

You can do groupby on columns Team and Classification and then calculate percentage:

group_count = df.groupby(['Team', 'Classification']).agg(np.size)

Output:
Team  Classification
A     0                 12
      1                  4
B     0                 12
      1                  8


group_percentage = group_count.groupby(level=0).apply(lambda x: 100 * x / float(x.sum()))

Output:
Team  Classification
A     0                 75.0
      1                 25.0
B     0                 60.0
      1                 40.0

Upvotes: 1

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476659

The percentage of ones is just the average of the Classification, since there is only 0 and 1 here:

>>> df.groupby('Team').mean()
      Classification
Team                
A               0.25
B               0.40

Note that this will not work in case the Classification column has values other than 0 and 1.

Upvotes: 2

BENY
BENY

Reputation: 323236

Using crosstab

pd.crosstab(df.Team,df.Classification,normalize='index')
Out[498]: 
Classification     0     1
Team                      
A               0.75  0.25
B               0.60  0.40

Upvotes: 6

Related Questions