How to filter groupby for first N items

Question

In Pandas, how can I modify groupby to only take the first N items in the group?

Example

df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 2, 2], 
                   'values': [1, 2, 3, 4, 5, 6, 7]})
>>> df
   id  values
0   1       1
1   1       2
2   1       3
3   2       4
4   2       5
5   2       6
6   2       7

Desired functionality

# This doesn't work, but I am trying to return the first two items per group.
>>> df.groupby('id').first(2)  
   id  values
0   1       1
1   1       2
3   2       4
4   2       5

What I've tried

I can perform a groupby and iterate through the groups to take the index of the first n values, but there must be a simpler solution.

n = 2  # First two rows.
idx = [i for group in df.groupby('id').groups.itervalues() for i in group[:n]]
>>> df.ix[idx]
   id  values
0   1       1
1   1       2
3   2       4
4   2       5

Andy Hayden · Accepted Answer

You can use head:

In [11]: df.groupby("id").head(2)
Out[11]:
   id  values
0   1       1
1   1       2
3   2       4
4   2       5

Note: In older versions this used to be equivalent to .apply(pd.DataFrame.head) but it's more efficient since 0.15 (?), now it uses cumcount under the hood.

How to filter groupby for first N items

Answers (1)

Related Questions