Merlin
Merlin

Reputation: 25629

Pandas: how to sum by groupby value

Using this:

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
         'Kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
         'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
         'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

df.groupby(['Team',"Rank"]).sum()

This is returned.

             Points
Team   Rank        
Devils 2        863
       3        673
Kings  1       1544
       3        741
       4        812
Riders 1        876
       2       2173
Royals 1        804
       4        701

How you I extract values (Points) where rank equals '1', so 1544+ 876+ 804. and the same for rank equals 2, and 3.

Upvotes: 5

Views: 3716

Answers (6)

jezrael
jezrael

Reputation: 862431

I believe need DataFrame.xs:

print (df.xs(1, level=1))

        Points
Team          
Kings     1544
Riders     876
Royals     804

print (df.xs(2, level=1))

        Points
Team          
Devils     863
Riders    2173

For select by multiple criteria use slicers:

idx = pd.IndexSlice
print (df.loc[idx[:, [1,2]], :])

             Points
Team   Rank        
Devils 2        863
Kings  1       1544
Riders 1        876
       2       2173
Royals 1        804

print (df.loc[idx['Riders', [1,2]], :])

             Points
Team   Rank        
Riders 1        876
       2       2173

If want sum all groups by Ranks change grouping columns from ['Team',"Rank"] to Rank:

s = df.groupby("Rank")['Points'].sum()
print (s)
Rank
1    3224
2    3036
3    1414
4    1513
Name: Points, dtype: int64

If need also df1 then use sum per level=1:

df1 = df.groupby(['Team',"Rank"]).sum()
print (df1)
             Points
Team   Rank        
Devils 2        863
       3        673
Kings  1       1544
       3        741
       4        812
Riders 1        876
       2       2173
Royals 1        804
       4        701

s1 = df1.sum(level=1)
print (s1)
      Points
Rank        
2       3036
3       1414
1       3224
4       1513

Upvotes: 4

niraj
niraj

Reputation: 18208

May be you can try swapping columns in groupby to ["Rank", "Team"]:

grouped = df.groupby(["Rank", "Team"]).sum()
print(grouped)

Result:

             Points
Rank Team          
1    Kings     1544
     Riders     876
     Royals     804
2    Devils     863
     Riders    2173
3    Devils     673
     Kings      741
4    Kings      812
     Royals     701

Then, to get sum for any rank, you can use loc. For e.g. for first rank would be:

grouped.loc[1].Points.sum()

Result:

3224

Upvotes: 1

Dan
Dan

Reputation: 45741

One option

>>> df_group = df.groupby(['Team',"Rank"]).sum().reset_index()
     Team  Rank  Points
0  Devils     2     863
1  Devils     3     673
2   Kings     1    1544
3   Kings     3     741
4   Kings     4     812
5  Riders     1     876
6  Riders     2    2173
7  Royals     1     804
8  Royals     4     701

and now you can just filter on 'Rank':

>>> df_group.loc[df_group['Rank']==1,'Points']
2    1544
5     876
7     804

Another options is grouping by Rank again and then aggregating as a list:

>>> df.groupby(['Team','Rank']).sum().reset_index().groupby('Rank')['Points'].agg(lambda x: list(x))
Rank
1    [1544, 876, 804]
2         [863, 2173]
3          [673, 741]
4          [812, 701]

Or maybe you just wanted to sort by rank, it's hard to tell because you haven't provided a desired output:

>>> df.groupby(['Team','Rank']).sum().reset_index().sort_values('Rank')
     Team  Rank  Points
2   Kings     1    1544
5  Riders     1     876
7  Royals     1     804
0  Devils     2     863
6  Riders     2    2173
1  Devils     3     673
3   Kings     3     741
4   Kings     4     812
8  Royals     4     701

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153460

I like using the axis argument in .loc:

df.groupby(['Team',"Rank"]).sum().loc(axis=0)[:,1]

Output:

             Points
Team   Rank        
Kings  1       1544
Riders 1        876
Royals 1        804

Or

df.groupby(['Team',"Rank"]).sum().loc(axis=0)[:,2]

             Points
Team   Rank        
Devils 2        863
Riders 2       2173

Or as @Jezrael has done without pd.Slicers:

df.groupby(['Team',"Rank"]).sum().loc(axis=0)[:,[1,2]]

             Points
Team   Rank        
Devils 2        863
Kings  1       1544
Riders 1        876
       2       2173
Royals 1        804

Upvotes: 1

javidcf
javidcf

Reputation: 59681

You can reorder by the rank after summing:

import pandas as pd

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
         'Kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
         'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
         'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

result = df.groupby(['Team', 'Rank']).sum().swaplevel().sort_index()
# Or just:
result = df.groupby(['Rank', 'Team']).sum()

print(result)

Output:

Rank Team
1    Kings     1544
     Riders     876
     Royals     804
2    Devils     863
     Riders    2173
3    Devils     673
     Kings      741
4    Kings      812
     Royals     701

Upvotes: 1

JimPri
JimPri

Reputation: 1407

df[df['Rank'] == 1] # Filter by rank before summing

Upvotes: 1

Related Questions