Reputation: 109526
In Pandas, how can I modify groupby
to only take the first N items in the group?
Example
df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 2, 2],
'values': [1, 2, 3, 4, 5, 6, 7]})
>>> df
id values
0 1 1
1 1 2
2 1 3
3 2 4
4 2 5
5 2 6
6 2 7
Desired functionality
# This doesn't work, but I am trying to return the first two items per group.
>>> df.groupby('id').first(2)
id values
0 1 1
1 1 2
3 2 4
4 2 5
What I've tried
I can perform a groupby and iterate through the groups to take the index of the first n
values, but there must be a simpler solution.
n = 2 # First two rows.
idx = [i for group in df.groupby('id').groups.itervalues() for i in group[:n]]
>>> df.ix[idx]
id values
0 1 1
1 1 2
3 2 4
4 2 5
Upvotes: 2
Views: 386
Reputation: 375445
You can use head
:
In [11]: df.groupby("id").head(2)
Out[11]:
id values
0 1 1
1 1 2
3 2 4
4 2 5
Note: In older versions this used to be equivalent to .apply(pd.DataFrame.head)
but it's more efficient since 0.15 (?), now it uses cumcount
under the hood.
Upvotes: 3