nimi1234
nimi1234

Reputation: 188

How to iterate over rows and get max values of any previous rows

I have this dataframe:

pd.DataFrame({'ids':['a','b','c','d','e','f']
             ,'id_order':[1,2,3,4,5,6]
             ,'value':[1000,500,3000,2000,1000,5000]})

What I want is to iterate over the rows and get the maximum value of all previous rows.

For example, when I iterate to id_order==2 I would get 1000 (from id_order 1).

When I move forward to id_order==5 I would get 3000 (from id_order 3)

The desired outcome should be as follows:

pd.DataFrame({'ids':['a','b','c','d','e','f']
             ,'id_order':[1,2,3,4,5,6]
             ,'value':[1000,500,2000,3000,1000,5000]
             ,'outcome':[0,1000,1000,2000,3000,3000]})

This will be done on a big dataset so efficiency is also a factor. I would greatly appreciate your help in this.

Thanks

Upvotes: 3

Views: 808

Answers (1)

Mustafa Aydın
Mustafa Aydın

Reputation: 18306

You can shift the value column and take the cumulative maximum:

df["outcome"] = df.value.shift(fill_value=0).cummax()

Since shifting nullifies the first entry we fill it with 0.

>>> df

  ids  id_order  value  outcome
0   a         1   1000        0
1   b         2    500     1000
2   c         3   3000     1000
3   d         4   2000     3000
4   e         5   1000     3000
5   f         6   5000     3000

Upvotes: 3

Related Questions