Reputation: 749
I have the following Data in a Pandas DataFrame
import pandas as pd
data = [['AAA','2019-01-01', 10], ['AAA','2019-01-02', 21],
['AAA','2019-02-01', 30], ['AAA','2019-02-02', 45],
['BBB','2019-01-01', 50], ['BBB','2019-01-02', 60],
['BBB','2019-02-01', 70],['BBB','2019-02-02', 80]]
dfx = pd.DataFrame(data, columns = ['NAME', 'TIMESTAMP','VALUE'])
NAME TIMESTAMP VALUE
0 AAA 2019-01-01 10
1 AAA 2019-01-02 21
2 AAA 2019-02-01 30
3 AAA 2019-02-02 45
4 BBB 2019-01-01 50
5 BBB 2019-01-02 60
6 BBB 2019-02-01 70
7 BBB 2019-02-02 80
Is it possible to create a column which returns the maximum value of the 'Index' across a group (for e.g. NAME), so the expected output is as follows:
NAME TIMESTAMP VALUE MAXIDX
0 AAA 2019-01-01 10 3
1 AAA 2019-01-02 21 3
2 AAA 2019-02-01 30 3
3 AAA 2019-02-02 45 3
4 BBB 2019-01-01 50 7
5 BBB 2019-01-02 60 7
6 BBB 2019-02-01 70 7
7 BBB 2019-02-02 80 7
Thanks and Regards.
Upvotes: 1
Views: 92
Reputation: 323276
I will do sort_values
+ drop_duplicates
+ merge
df.merge(df.sort_values('VALUE').
drop_duplicates('NAME',keep='last')['NAME'].
reset_index(),on='NAME')
Out[73]:
NAME TIMESTAMP VALUE index
0 AAA 2019-01-01 10 3
1 AAA 2019-01-02 21 3
2 AAA 2019-02-01 30 3
3 AAA 2019-02-02 45 3
4 BBB 2019-01-01 50 7
5 BBB 2019-01-02 60 7
6 BBB 2019-02-01 70 7
7 BBB 2019-02-02 80 7
Upvotes: 0
Reputation: 749
The following code solved my problem for me, thanks to @Jon and @Celius.
dfx.reset_index(drop=False)
dfx['MAXIDX'] = dfx.groupby('NAME')['index'].transform(max)
Upvotes: 1
Reputation: 142156
You can groupby and transform for idxmax, eg:
dfx['MAXIDX'] = dfx.groupby('NAME').transform('idxmax')
Upvotes: 4
Reputation: 18367
Given your example, you can use reset_index() together with groupby and then merge it into the original dataframe:
import pandas as pd
data = [['AAA','2019-01-01', 10], ['AAA','2019-01-02', 21],
['AAA','2019-02-01', 30], ['AAA','2019-02-02', 45],
['BBB','2019-01-01', 50], ['BBB','2019-01-02', 60],
['BBB','2019-02-01', 70],['BBB','2019-02-02', 80]]
dfx = pd.DataFrame(data, columns = ['NAME', 'TIMESTAMP','VALUE'])
dfx = dfx.reset_index(drop=False)
dfx = dfx.merge(dfx.groupby('NAME',as_index=False).agg({'index':'max'}),how='left',on='NAME').rename(columns={'index_y':'max_index'}).drop(columns='index_x')
Output:
NAME TIMESTAMP VALUE max_index
AAA 2019-01-01 10 3
AAA 2019-01-02 21 3
AAA 2019-02-01 30 3
AAA 2019-02-02 45 3
BBB 2019-01-01 50 7
BBB 2019-01-02 60 7
BBB 2019-02-01 70 7
BBB 2019-02-02 80 7
Upvotes: 1