Reputation: 1256
I have the following dataframe that I'm wanting to groupby year and return the max value (but keep the index values as they are here):
import pandas as pd
dct = {
"date": ["2019-01-01", "2019-04-1", "2020-01-01"],
"high": [100, 150, 100],
}
df = pd.DataFrame(dct)
df.set_index("date",inplace=True)
df.index = [pd.Timestamp(i) for i in df.index]
df.index.name = "date"
# date high
# 2019-01-01 100
# 2019-04-01 150
# 2020-01-01 100
When using pandas groupby, Im able to group them by year, but not get the date that I want:
func = lambda x: x.year
df["high"].groupby(func).max()
# date high
# 2019 150
# 2020 100
My desired output is to use pandas groupby and get:
# NOTE : the date index is like the original
# date high
# 2019-04-01 150
# 2020-01-01 100
Upvotes: 0
Views: 1738
Reputation: 25239
You also could use nlargest
and droplevel
func = lambda x: x.year
df["high"].groupby(func).nlargest(1).droplevel(0)
Out[7]:
date
2019-04-01 150
2020-01-01 100
Name: high, dtype: int64
Upvotes: 0
Reputation: 323226
sort_values
then do groupby
with tail
df.sort_values('high').groupby(df.index.year).tail(1)
high
date
2020-01-01 100
2019-04-01 150
When you doing the df["high"].groupby(func).max()
, it is series groupby not dataframe groupby , so the output will not carry over the dataframe index
Upvotes: 2
Reputation: 150735
Another way is use idxmax
and loc
access:
df.loc[df.groupby(df.index.year).high.idxmax()]
Output:
high
date
2019-04-01 150
2020-01-01 100
Upvotes: 1