Reputation: 2713
I've got a data frame that looks something along these lines:
Dog_breed Dog_name Points
============================================================
Monday Pug George 12
Tuesday Poodle Fido -15
Wednesday Pug Buddy 8
Thursday Dachshund Lyla -10
Friday Poodle Remi 9
Saturday Pug Kermit -5
Sunday Beagle Sara 3
For every breed of dog (i.e., category), I need to get the sum of the points — for some rows, it's positive, and for some, it's negative — and the number of dogs of that breed.
The goal is to add up all the points for each breed, count the number of times each breed occurs in the original data set, and end up with a data frame that looks like this:
Quantity Overall_point_change
============================================
Pug 3 15
Poodle 2 -6
Dachshund 1 -10
Beagle 1 3
Can pandas do this, or does it require a loop? Really struggling to wrap my head around this — can anyone recommend a way to set this up?
Upvotes: 2
Views: 3696
Reputation: 13955
DF.groupby('Dog_breed').agg({'Points': sum, 'Dog_name' : 'count'})
gives perhaps a slightly better outcome? Only because it specifies the aggregation function applied to each column.
Upvotes: 2
Reputation: 5467
You're looking for pandas' groupby
.
df.groupby('Dog_breed').agg(['count', 'sum'])
Read the following for a firmer understanding: http://pandas.pydata.org/pandas-docs/stable/groupby.html
Upvotes: 7