Reputation: 137
Given this dataframe:
df
animal location number
0 lions zoo1 10
1 tigers zoo2 200
2 bears zoo2 300
3 ohmy zoo2 400
4 antelopes zoo1 20
5 orangutans zoo2 500
6 zebras zoo1 30
I need to find the top three animals by zoo, sum them independent of animal type, and then rank which zoo has the most animals. I've gotten to the point where I can select and group by location and animal, I think what I need to do now is create a series with the location as the key and the sum of each location as the value, but I'm stuck. (If that's not the best way to approach it, I'm open to suggestions.)
df.set_index(['location','number'], inplace=True)
df.groupby(level=[0,1]).sum().reset_index().sort_values(['location', 'number'], ascending=[1,0]).groupby('location').head(3)
location number animal
2 zoo1 30 zebras
1 zoo1 20 antelopes
0 zoo1 10 lions
6 zoo2 500 orangutans
5 zoo2 400 ohmy
4 zoo2 300 bears
(The sum()
doesn't seem to be doing anything in this context, but I have to have something for the groupby()
and I don't know if there's anything better.) What I'm looking for from here is:
zoo2 1200
zoo1 60
Code to create the initial df:
df = pd.DataFrame({'animal': ['lions', 'tigers', 'bears', 'ohmy', 'antelopes', 'orangutans','zebras'],
'location': ['zoo1', 'zoo2', 'zoo2', 'zoo2', 'zoo1', 'zoo2', 'zoo1'],
'number': [10, 200, 300, 400, 20, 500, 30]})
Upvotes: 0
Views: 43
Reputation: 30920
using your initial data frame:
df
animal location number
0 lions zoo1 10
1 tigers zoo2 200
2 bears zoo2 300
3 ohmy zoo2 400
4 antelopes zoo1 20
5 orangutans zoo2 500
6 zebras zoo1 30
Use groupby:
df.groupby('location')['number'].apply(lambda x: x.sort_values(ascending=False).head(3).sum())
Output:
location
zoo1 60
zoo2 1200
Name: number, dtype: int64
Upvotes: 0
Reputation: 3391
This can be done using groupby
:
df.groupby('location')['number'].sum()
will return this as output :
location
zoo1 60
zoo2 1200
Name: number, dtype: int64
Upvotes: 1