lonstud
lonstud

Reputation: 525

How to populate rows of pandas dataframe column based with previous row based on a multiple conditions?

Disclaimer: This might be possible duplicate but I cannot find the exact solution. Please feel free to mark this question as duplicate and provide link to duplicate question in comments.

I am still learning python dataframe operations and this possibly has a very simple solution which I am not able to figure out.

I have a python dataframe with a single columns. Now I want to change value of each row to value of previous row if certain conditions are satisfied. I have created a loop solution to implement this but I was hoping for a more efficient solution.

Creation of initial data:

import numpy as np
import pandas as pd

data = np.random.randint(5,30,size=20)
df = pd.DataFrame(data, columns=['random_numbers'])

print(df)

    random_numbers
0                6
1               24
2               29
3               18
4               22
5               17
6               12
7                7
8                6
9               27
10              29
11              13
12              23
13               6
14              25
15              24
16              16
17              15
18              25
19              19

Now lets assume two condition are 1) value less than 10 and 2) value more than 20. In any of these cases, set row value to previous row value. This has been implement in loop format as follows:

for index,row in df.iterrows():
    if index == 0:
        continue;
    if(row.random_numbers<10):
        df.loc[index,'random_numbers']=df.loc[index-1,'random_numbers']
    if(row.random_numbers>20):
        df.loc[index,'random_numbers']=df.loc[index-1,'random_numbers']

    random_numbers
0                6
1                6
2                6
3               18
4               18
5               17
6               12
7               12
8               12
9               12
10              12
11              13
12              13
13              13
14              13
15              13
16              16
17              15
18              15
19              19

Please suggest a more efficient way to implement this logic as I am using large number of rows.

Upvotes: 5

Views: 2563

Answers (2)

SeaBean
SeaBean

Reputation: 23217

We can also do it in a simpler way by using .mask() together with .ffill() and slicing on [1:] as follows:

df['random_numbers'][1:] = df['random_numbers'][1:].mask((df['random_numbers'] < 10) | (df['random_numbers'] > 20))

df['random_numbers'] = df['random_numbers'].ffill(downcast='infer')

.mask() tests for the condition and replace with NaN when the condition is true (default to replace with NaN if the parameter other= is not supplied). Retains the original values for rows with condition not met.

Note that the resulting numbers are maintained as integer instead of transformed unexpectedly to float type by supplying the downcast='infer' in the call to .ffill().

We use [1:] on the first line to ensure the data on row 0 is untouched without transformation.

# Original data:  (reusing your sample data)

    random_numbers
0                6
1               24
2               29
3               18
4               22
5               17
6               12
7                7
8                6
9               27
10              29
11              13
12              23
13               6
14              25
15              24
16              16
17              15
18              25
19              19


# After transposition:

    random_numbers
0                6
1                6
2                6
3               18
4               18
5               17
6               12
7               12
8               12
9               12
10              12
11              13
12              13
13              13
14              13
15              13
16              16
17              15
18              15
19              19

Upvotes: 2

Ynjxsjmh
Ynjxsjmh

Reputation: 30012

You can replace the values less than 10 and values more than 20 with NaN then use pandas.DataFrame.ffill() to fill nan with previous row value.

mask = (df['random_numbers'] < 10) | (df['random_numbers'] > 20)

# Since you escape with `if index == 0:`
mask[df.index[0]] = False

df.loc[mask, 'random_numbers'] = np.nan

df['random_numbers'].ffill(inplace=True)
# Original

    random_numbers
0                7
1               28
2                8
3               14
4               12
5               20
6               21
7               11
8               16
9               27
10              19
11              23
12              18
13               5
14               6
15              11
16               6
17               8
18              17
19               8

# After replaced

    random_numbers
0              7.0
1              7.0
2              7.0
3             14.0
4             12.0
5             20.0
6             20.0
7             11.0
8             16.0
9             16.0
10            19.0
11            19.0
12            18.0
13            18.0
14            18.0
15            11.0
16            11.0
17            11.0
18            17.0
19            17.0

Upvotes: 6

Related Questions