SGer
SGer

Reputation: 552

Pandas: Get top 10 values AFTER grouping

I have a pandas data frame with a column 'id' and a column 'value'. It is already sorted by first id (ascending) and then value (descending). What I need is the top 10 values per id.

I assumed that something like the following would work, but it doesn't:

df.groupby("id", as_index=False).aggregate(lambda (index,rows) : rows.iloc[:10])

What I get is just a list of ids, the value column (and other columns that I omitted for the question) aren't there anymore.

Any ideas how it might be done, without iterating through each of the single rows and appending the first ten to another data structure?

Upvotes: 1

Views: 4673

Answers (2)

Roopa Sudhakar
Roopa Sudhakar

Reputation: 39

I would like to answer this by giving and example dataframe as:

df = pd.DataFrame(np.array([['a','a','b','c','a','c','b'],[4,6,1,8,9,4,1],[12,11,7,1,5,5,7],[123,54,146,96,10,114,200]]).T,columns=['item','date','hour','value'])
df['value'] = pd.to_numeric(df['value'])

This gives you a dataframe

item    date    hour    value
a   4   12  123
a   6   11  54
b   1   7   146
c   8   1   96
a   9   5   10
c   4   5   114
b   1   7   200

Now this is grouped below and displays first 2 values of grouped items.

df.groupby(['item'])['value'].head(2)

Upvotes: 1

Colin
Colin

Reputation: 2137

Is this what you're looking for?

df.groupby('id').head(10)

Upvotes: 2

Related Questions