How to iterate over rows and get max values of any previous rows

Question

I have this dataframe:

pd.DataFrame({'ids':['a','b','c','d','e','f']
             ,'id_order':[1,2,3,4,5,6]
             ,'value':[1000,500,3000,2000,1000,5000]})

What I want is to iterate over the rows and get the maximum value of all previous rows.

For example, when I iterate to id_order==2 I would get 1000 (from id_order 1).

When I move forward to id_order==5 I would get 3000 (from id_order 3)

The desired outcome should be as follows:

pd.DataFrame({'ids':['a','b','c','d','e','f']
             ,'id_order':[1,2,3,4,5,6]
             ,'value':[1000,500,2000,3000,1000,5000]
             ,'outcome':[0,1000,1000,2000,3000,3000]})

This will be done on a big dataset so efficiency is also a factor. I would greatly appreciate your help in this.

Thanks

Mustafa Aydın · Accepted Answer

You can shift the value column and take the cumulative maximum:

df["outcome"] = df.value.shift(fill_value=0).cummax()

Since shifting nullifies the first entry we fill it with 0.

>>> df

  ids  id_order  value  outcome
0   a         1   1000        0
1   b         2    500     1000
2   c         3   3000     1000
3   d         4   2000     3000
4   e         5   1000     3000
5   f         6   5000     3000

How to iterate over rows and get max values of any previous rows

Answers (1)

Related Questions