Reputation: 147
I've been super close to figuring out how to sort my data frame the way that I want, I just don't know if there is a cleaner way to do this.
Let's say my data frame is defined as-
dic = {'firstname':['John','John','John','John','John','Susan','Susan',
'Susan','Susan','Susan','Mike','Mike','Mike','Mike',
'Mike'],
'lastname':['Smith','Smith','Smith','Smith','Smith','Wilson',
'Wilson','Wilson','Wilson','Wilson','Jones','Jones',
'Jones','Jones','Jones'],
'company':['KFC','BK','KFC','KFC','KFC','BK','BK','WND','WND',
'WND','TB','CHP','TB','CHP','TB'],
'paid':[200,300,250,100,900,650,430,218,946,789,305,750,140,860,310],
'overtime':[205,554,840,100,203,640,978,451,356,779,650,950,230,250,980]}
df = pd.DataFrame(dic)
print(df)
with output-
firstname lastname company paid overtime
0 John Smith KFC 200 205
1 John Smith BK 300 554
2 John Smith KFC 250 840
3 John Smith KFC 100 100
4 John Smith KFC 900 203
5 Susan Wilson BK 650 640
6 Susan Wilson BK 430 978
7 Susan Wilson WND 218 451
8 Susan Wilson WND 946 356
9 Susan Wilson WND 789 779
10 Mike Jones TB 305 650
11 Mike Jones CHP 750 950
12 Mike Jones TB 140 230
13 Mike Jones CHP 860 250
14 Mike Jones TB 310 980
my goal is to find out if someone makes more than 1,300 at a company excluding overtime (so just the paid column)
this is what I've attempted-
df = df.groupby(['lastname', 'firstname','company']).sum()
s = df['paid']>1300
df['limit']=s
df = df.loc[df['limit']==True]
del df['limit']
df = df.sort_values(by=['paid'],ascending=False)
print(df)
with output result-
paid overtime
lastname firstname company
Wilson Susan WND 1953 1586
Jones Mike CHP 1610 1200
Smith John KFC 1450 1348
I'm looking for help to potentially clean up my work, but also with a few questions.
Why is the output column for paid and overtime raised one? Can I shift that back down? Also, I only want to sum the paid column and nothing else (It's ok if the overtime column doesn't appear in the final output)
This is more what I'm looking for-
lastname firstname company paid
Wilson Susan WND 1953
Jones Mike CHP 1610
Smith John KFC 1450
Upvotes: 0
Views: 40
Reputation: 7604
Change this line, use as_index=False
and agg
:
df = df.groupby(['lastname', 'firstname','company'], as_index=False).agg({'paid': 'sum'})
lastname firstname company paid
5 Wilson Susan WND 1953
0 Jones Mike CHP 1610
3 Smith John KFC 1450
Upvotes: 1