Sum Values by Grouped Column

Question

I've been super close to figuring out how to sort my data frame the way that I want, I just don't know if there is a cleaner way to do this.

Let's say my data frame is defined as-

dic = {'firstname':['John','John','John','John','John','Susan','Susan',
                    'Susan','Susan','Susan','Mike','Mike','Mike','Mike',
                    'Mike'],
       'lastname':['Smith','Smith','Smith','Smith','Smith','Wilson',
                   'Wilson','Wilson','Wilson','Wilson','Jones','Jones',
                   'Jones','Jones','Jones'],
       'company':['KFC','BK','KFC','KFC','KFC','BK','BK','WND','WND',
                  'WND','TB','CHP','TB','CHP','TB'],
       'paid':[200,300,250,100,900,650,430,218,946,789,305,750,140,860,310],
       'overtime':[205,554,840,100,203,640,978,451,356,779,650,950,230,250,980]}
df = pd.DataFrame(dic)
print(df)

with output-

   firstname lastname company  paid  overtime
0       John    Smith     KFC   200       205
1       John    Smith      BK   300       554
2       John    Smith     KFC   250       840
3       John    Smith     KFC   100       100
4       John    Smith     KFC   900       203
5      Susan   Wilson      BK   650       640
6      Susan   Wilson      BK   430       978
7      Susan   Wilson     WND   218       451
8      Susan   Wilson     WND   946       356
9      Susan   Wilson     WND   789       779
10      Mike    Jones      TB   305       650
11      Mike    Jones     CHP   750       950
12      Mike    Jones      TB   140       230
13      Mike    Jones     CHP   860       250
14      Mike    Jones      TB   310       980

my goal is to find out if someone makes more than 1,300 at a company excluding overtime (so just the paid column)

this is what I've attempted-

df = df.groupby(['lastname', 'firstname','company']).sum()
s = df['paid']>1300
df['limit']=s
df = df.loc[df['limit']==True]
del df['limit']
df = df.sort_values(by=['paid'],ascending=False)
print(df)

with output result-

                            paid  overtime
lastname firstname company                
Wilson   Susan     WND      1953      1586
Jones    Mike      CHP      1610      1200
Smith    John      KFC      1450      1348

I'm looking for help to potentially clean up my work, but also with a few questions.

Why is the output column for paid and overtime raised one? Can I shift that back down? Also, I only want to sum the paid column and nothing else (It's ok if the overtime column doesn't appear in the final output)

This is more what I'm looking for-

lastname firstname company paid              
Wilson   Susan     WND     1953
Jones    Mike      CHP     1610
Smith    John      KFC     1450

NYC Coder · Accepted Answer

Change this line, use as_index=False and agg:

df = df.groupby(['lastname', 'firstname','company'], as_index=False).agg({'paid': 'sum'})

  lastname firstname company  paid
5   Wilson     Susan     WND  1953
0    Jones      Mike     CHP  1610
3    Smith      John     KFC  1450

Sum Values by Grouped Column

Answers (1)

Related Questions