How to populate rows of pandas dataframe column based with previous row based on a multiple conditions?

Question

Disclaimer: This might be possible duplicate but I cannot find the exact solution. Please feel free to mark this question as duplicate and provide link to duplicate question in comments.

I am still learning python dataframe operations and this possibly has a very simple solution which I am not able to figure out.

I have a python dataframe with a single columns. Now I want to change value of each row to value of previous row if certain conditions are satisfied. I have created a loop solution to implement this but I was hoping for a more efficient solution.

Creation of initial data:

import numpy as np
import pandas as pd

data = np.random.randint(5,30,size=20)
df = pd.DataFrame(data, columns=['random_numbers'])

print(df)

    random_numbers
0                6
1               24
2               29
3               18
4               22
5               17
6               12
7                7
8                6
9               27
10              29
11              13
12              23
13               6
14              25
15              24
16              16
17              15
18              25
19              19

Now lets assume two condition are 1) value less than 10 and 2) value more than 20. In any of these cases, set row value to previous row value. This has been implement in loop format as follows:

for index,row in df.iterrows():
    if index == 0:
        continue;
    if(row.random_numbers<10):
        df.loc[index,'random_numbers']=df.loc[index-1,'random_numbers']
    if(row.random_numbers>20):
        df.loc[index,'random_numbers']=df.loc[index-1,'random_numbers']

    random_numbers
0                6
1                6
2                6
3               18
4               18
5               17
6               12
7               12
8               12
9               12
10              12
11              13
12              13
13              13
14              13
15              13
16              16
17              15
18              15
19              19

Please suggest a more efficient way to implement this logic as I am using large number of rows.

Ynjxsjmh · Accepted Answer

You can replace the values less than 10 and values more than 20 with NaN then use pandas.DataFrame.ffill() to fill nan with previous row value.

mask = (df['random_numbers'] < 10) | (df['random_numbers'] > 20)

# Since you escape with `if index == 0:`
mask[df.index[0]] = False

df.loc[mask, 'random_numbers'] = np.nan

df['random_numbers'].ffill(inplace=True)

# Original

    random_numbers
0                7
1               28
2                8
3               14
4               12
5               20
6               21
7               11
8               16
9               27
10              19
11              23
12              18
13               5
14               6
15              11
16               6
17               8
18              17
19               8

# After replaced

    random_numbers
0              7.0
1              7.0
2              7.0
3             14.0
4             12.0
5             20.0
6             20.0
7             11.0
8             16.0
9             16.0
10            19.0
11            19.0
12            18.0
13            18.0
14            18.0
15            11.0
16            11.0
17            11.0
18            17.0
19            17.0

How to populate rows of pandas dataframe column based with previous row based on a multiple conditions?

Answers (2)

Related Questions