thomas.mac
thomas.mac

Reputation: 1256

Preserving original index when using pandas groupby

I have the following dataframe that I'm wanting to groupby year and return the max value (but keep the index values as they are here):

import pandas as pd

dct = {
"date": ["2019-01-01", "2019-04-1", "2020-01-01"],
"high": [100, 150, 100],
}
df = pd.DataFrame(dct)
df.set_index("date",inplace=True)
df.index = [pd.Timestamp(i) for i in df.index]
df.index.name = "date" 

# date         high
# 2019-01-01   100
# 2019-04-01   150
# 2020-01-01   100

When using pandas groupby, Im able to group them by year, but not get the date that I want:

func = lambda x: x.year
df["high"].groupby(func).max()

# date    high
# 2019    150
# 2020    100

My desired output is to use pandas groupby and get:

 # NOTE : the date index is like the original

 # date         high
 # 2019-04-01   150
 # 2020-01-01   100

Upvotes: 0

Views: 1738

Answers (3)

Andy L.
Andy L.

Reputation: 25239

You also could use nlargest and droplevel

func = lambda x: x.year

df["high"].groupby(func).nlargest(1).droplevel(0)

Out[7]:
date
2019-04-01    150
2020-01-01    100
Name: high, dtype: int64

Upvotes: 0

BENY
BENY

Reputation: 323226

sort_values then do groupby with tail

df.sort_values('high').groupby(df.index.year).tail(1)
            high
date            
2020-01-01   100
2019-04-01   150

When you doing the df["high"].groupby(func).max(), it is series groupby not dataframe groupby , so the output will not carry over the dataframe index

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150735

Another way is use idxmax and loc access:

df.loc[df.groupby(df.index.year).high.idxmax()]

Output:

            high
date            
2019-04-01   150
2020-01-01   100

Upvotes: 1

Related Questions