Reputation: 374
I have a dataframe and I want to get the max number of counts and its corresponding year for movies and series. The df is as follows:
df = pd.DataFrame(
[
("movie", "2020", 45),
("movie", "2019", 37),
("series", "2018", 16),
("series", "2010", 45),
],
columns=("type", "year", "count"),
)
What I've tried:
df[df['type'] == "series"].max()
df[df['type'] == "movie"].max()
However this does not seem to work well as it returns the max year and max count. As for example, for the series one, it will return:
type series
year 2018
count 45
dtype: object
and it should return:
type series
year 2010
count 45
dtype: object
How should I build my query so it returns the highest count with its corresponding year?
Thank you
Upvotes: 1
Views: 45
Reputation: 323306
Try with idxmax
out = df.loc[df.groupby('type')['count'].idxmax()]
Out[200]:
type year count
0 movie 2020 45
3 series 2010 45
Upvotes: 3
Reputation: 150765
You can try sort and drop duplicates:
df.sort_values('count').drop_duplicates('type', keep='last')
Output:
type year count
0 movie 2020 45
3 series 2010 45
Upvotes: 2