babsdoc
babsdoc

Reputation: 749

Get Column with Max Index Across a Group in Pandas Dataframe

I have the following Data in a Pandas DataFrame

import pandas as pd

data = [['AAA','2019-01-01', 10], ['AAA','2019-01-02', 21],
        ['AAA','2019-02-01', 30], ['AAA','2019-02-02', 45],
        ['BBB','2019-01-01', 50], ['BBB','2019-01-02', 60],
        ['BBB','2019-02-01', 70],['BBB','2019-02-02', 80]]

dfx = pd.DataFrame(data, columns = ['NAME', 'TIMESTAMP','VALUE'])

  NAME   TIMESTAMP  VALUE
0  AAA  2019-01-01     10
1  AAA  2019-01-02     21
2  AAA  2019-02-01     30
3  AAA  2019-02-02     45
4  BBB  2019-01-01     50
5  BBB  2019-01-02     60
6  BBB  2019-02-01     70
7  BBB  2019-02-02     80

Is it possible to create a column which returns the maximum value of the 'Index' across a group (for e.g. NAME), so the expected output is as follows:

  NAME   TIMESTAMP  VALUE MAXIDX
0  AAA  2019-01-01     10    3
1  AAA  2019-01-02     21    3
2  AAA  2019-02-01     30    3
3  AAA  2019-02-02     45    3
4  BBB  2019-01-01     50    7
5  BBB  2019-01-02     60    7
6  BBB  2019-02-01     70    7
7  BBB  2019-02-02     80    7

Thanks and Regards.

Upvotes: 1

Views: 92

Answers (4)

BENY
BENY

Reputation: 323276

I will do sort_values + drop_duplicates + merge

df.merge(df.sort_values('VALUE').
         drop_duplicates('NAME',keep='last')['NAME'].
         reset_index(),on='NAME')
Out[73]: 
  NAME   TIMESTAMP  VALUE  index
0  AAA  2019-01-01     10      3
1  AAA  2019-01-02     21      3
2  AAA  2019-02-01     30      3
3  AAA  2019-02-02     45      3
4  BBB  2019-01-01     50      7
5  BBB  2019-01-02     60      7
6  BBB  2019-02-01     70      7
7  BBB  2019-02-02     80      7

Upvotes: 0

babsdoc
babsdoc

Reputation: 749

The following code solved my problem for me, thanks to @Jon and @Celius.

dfx.reset_index(drop=False)
dfx['MAXIDX'] = dfx.groupby('NAME')['index'].transform(max)

Upvotes: 1

Jon Clements
Jon Clements

Reputation: 142156

You can groupby and transform for idxmax, eg:

dfx['MAXIDX'] = dfx.groupby('NAME').transform('idxmax')

Upvotes: 4

Celius Stingher
Celius Stingher

Reputation: 18367

Given your example, you can use reset_index() together with groupby and then merge it into the original dataframe:

import pandas as pd

data = [['AAA','2019-01-01', 10], ['AAA','2019-01-02', 21],
        ['AAA','2019-02-01', 30], ['AAA','2019-02-02', 45],
        ['BBB','2019-01-01', 50], ['BBB','2019-01-02', 60],
        ['BBB','2019-02-01', 70],['BBB','2019-02-02', 80]]

dfx = pd.DataFrame(data, columns = ['NAME', 'TIMESTAMP','VALUE'])

dfx = dfx.reset_index(drop=False)
dfx = dfx.merge(dfx.groupby('NAME',as_index=False).agg({'index':'max'}),how='left',on='NAME').rename(columns={'index_y':'max_index'}).drop(columns='index_x')

Output:

NAME    TIMESTAMP   VALUE   max_index
 AAA    2019-01-01     10           3
 AAA    2019-01-02     21           3
 AAA    2019-02-01     30           3
 AAA    2019-02-02     45           3
 BBB    2019-01-01     50           7
 BBB    2019-01-02     60           7
 BBB    2019-02-01     70           7
 BBB    2019-02-02     80           7

Upvotes: 1

Related Questions