Reputation: 2146
I am in a loop which gives me a groupby output like below
df.groupby(['grp1','grp2'])['mycol'].sum()
Basically I am getting sum of my grouped elements.
grp1 grp2
A 1 10
B 1 20
C 2 30
D 3 40
E 4 50
1 60
Now in the next iteration I may get a grouped df like below
grp1 grp2
A 1 20
D 3 40
E 4 30
1 90
F 1 40
I want to take the max
from each iteration.
So after 2nd iteration I have an output like
grp1 grp2
A 1 20 #because 20 was higher than 10
B 1 20 #carried as it is
C 2 30 #carried as it is
D 3 40 #carried as it is (both were equal)
E 4 30 #because 90+30 >50+60
1 90
F 1 40 #added
So by the end I have which group reached peak values during say 5 iterations.
It sounds straight forward ( to keep track of max seen till now), but I am not getting how to approach this.
I tried doing df.groupby(['grp1','grp2'])['mycol'].sum().to_dict()
and do something like updating dict on reading new df. (just a try, not sure how to keep dict updated) or maybe there is a simple pandas and np
solution I still dont know.
Upvotes: 1
Views: 50
Reputation: 1848
You can use groupby again! :-)
level
param), .max()
Shown in code:
res = pd.concat(list_of_iteration_results)
res = res.groupby(level=['grp1', 'grp2']).max()
Upvotes: 2