Reputation: 619

Pandas groupby year filtering the dataframe by n largest values

I have a dataframe at hourly level with several columns. I want to extract the entire rows (containing all columns) of the 10 top values of a specific column for every year in my dataframe.

so far I ran the following code:

df = df.groupby([df.index.year])['totaldemand'].apply(lambda grp: grp.nlargest(10)))

The problem here is that I only get the top 10 values for each year of that specific column and I lose the other columns. How can I do this operation and having the corresponding values of the other columns that correspond to the top 10 values per year of my 'totaldemand' column?

Upvotes: 2

Answers (3)

Valdi_Bo

Reputation: 30991

nlargest can be applied to each group, passing the column to look for largest values.

So run:

df.groupby([df.index.year]).apply(lambda grp: grp.nlargest(3, 'totaldemand'))

Of course, in the final version replace 3 with your actual value.

Upvotes: 1

gosuto

Reputation: 5741

Get the index of your query and use it as a mask on your original df:

idx = df.groupby([df.index.year])['totaldemand'].apply(lambda grp: grp.nlargest(10))).index.to_list()
df.iloc[idx,]

(or something to that extend, I can't test now without any test data)

Upvotes: 0

BENY

Reputation: 323286

We usually do head after sort_values

df = df.sort_values('totaldemand',ascending = False).groupby([df.index.year])['totaldemand'].head(10)

Upvotes: 1

Pandas groupby year filtering the dataframe by n largest values

Answers (3)

Related Questions