Reputation: 188
I have this dataframe:
pd.DataFrame({'ids':['a','b','c','d','e','f']
,'id_order':[1,2,3,4,5,6]
,'value':[1000,500,3000,2000,1000,5000]})
What I want is to iterate over the rows and get the maximum value of all previous rows.
For example, when I iterate to id_order==2
I would get 1000 (from id_order 1).
When I move forward to id_order==5
I would get 3000 (from id_order 3)
The desired outcome should be as follows:
pd.DataFrame({'ids':['a','b','c','d','e','f']
,'id_order':[1,2,3,4,5,6]
,'value':[1000,500,2000,3000,1000,5000]
,'outcome':[0,1000,1000,2000,3000,3000]})
This will be done on a big dataset so efficiency is also a factor. I would greatly appreciate your help in this.
Thanks
Upvotes: 3
Views: 808
Reputation: 18306
You can shift the value
column and take the cumulative maximum:
df["outcome"] = df.value.shift(fill_value=0).cummax()
Since shifting nullifies the first entry we fill it with 0.
>>> df
ids id_order value outcome
0 a 1 1000 0
1 b 2 500 1000
2 c 3 3000 1000
3 d 4 2000 3000
4 e 5 1000 3000
5 f 6 5000 3000
Upvotes: 3