Jui Sen
Jui Sen

Reputation: 377

Maximum value for a particular ID

I have a data frame:

df = pd.DataFrame({"id": [1,1,1,2,2,2,3,3,3], "elev": [10,11,2,45,5,4,4,4,9], "year": [2009,2010,2010,2009,2010,2011,2006,2007,2007]})

id  elev   year
1    10    2009
1    11    2010
1    2     2010
2    45    2009
2     5    2010
2     4    2011
3     4    2006
3     4    2007
3     9    2007

I want to keep only the maximum year rows for each ID:

id  elev   year
1     11    2010
1     2     2010
2     4     2011
3     4     2007
3     9     2007

I tried the following code:

df = df.groupby("id")

df = df.max('year')
df = df.reset_index()

But its keeps only single row for each ID.

Upvotes: 1

Views: 1155

Answers (1)

sammywemmy
sammywemmy

Reputation: 28709

You need to create a boolean to filter the original dataframe with; this is possible with transform:

In [164]: df.loc[df.year.eq(df.groupby('id').year.transform('max'))]
Out[164]: 
   id  elev  year
1   1    11  2010
2   1     2  2010
5   2     4  2011
7   3     4  2007
8   3     9  2007

Breakdown: This generates the max for each ID, and appends to each relevant index.

In [165]: df.groupby('id').year.transform('max')
Out[165]: 
0    2010
1    2010
2    2010
3    2011
4    2011
5    2011
6    2007
7    2007
8    2007
Name: year, dtype: int64

Step2: Compare the year with the max

df.year.eq(df.groupby('id').year.transform('max'))

0    False
1     True
2     True
3    False
4    False
5     True
6    False
7     True
8     True
Name: year, dtype: bool

Step3 : Indexing

In [164]: df.loc[df.year.eq(df.groupby('id').year.transform('max'))]
Out[164]: 
   id  elev  year
1   1    11  2010
2   1     2  2010
5   2     4  2011
7   3     4  2007
8   3     9  2007

Upvotes: 3

Related Questions