Mattpats
Mattpats

Reputation: 534

Pandas column that depends on its previous value (row)?

I would like to create a 3rd column in my dataframe, which depends on both the new and existing columns in the previous row.

This new column should start at 0.

I would like my 3rd column to start at 0.

Its next value is its previous value plus df.below_lo[i] (if the previous value was 0).

If its previous value was 1, its next value is its previous value plus df.above_hi[i].

I think I have two issues: how to initiate this 3rd column and how to make it dependent on itself.

import pandas as pd
import math

data = {'below_lo': [0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
        'above_hi': [0, 0, -1, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0]}

df = pd.DataFrame(data)

df['pos'] = math.nan
df['pos'][0] = 0

for i in range(len(df.below_lo)):
    if df.pos[i] == 0:
        df.pos[i+1] = df.pos[i] + df.below_lo[i]
    if df.pos[i] == 1:
        df.pos[i+1] = df.pos[i] + df.above_hi[i]

print(df)

The desired output is:

    below_lo  above_hi  pos
0        0.0       0.0  0.0
1        1.0       0.0  0.0
2        0.0      -1.0  1.0
3        0.0       0.0  0.0
4        0.0      -1.0  0.0
5        0.0       0.0  0.0
6        0.0      -1.0  0.0
7        0.0       0.0  0.0
8        0.0       0.0  0.0
9        1.0       0.0  0.0
10       0.0       0.0  1.0
11       0.0       0.0  1.0
12       0.0       0.0  1.0
13       NaN       NaN  1.0

The above code produces the correct output, except I am also getting a few of these error messages:

A value is trying to be set on a copy of a slice from a DataFrame

How do I clean this code up so that it runs without throwing this warning? ?

Upvotes: 2

Views: 2172

Answers (3)

Luis l
Luis l

Reputation: 123

A solution that works faster is

df_ser['go_up'] = 0
df_ser['go_down'] = 0
df_ser.loc[(((df_ser["awake"] == 1) & (df_ser["awake"].shift() == 0) ), 'go_up'] = 1 # Go from 0 to 1
df_ser.loc[(((df_ser["awake"] == 0) & (df_ser["awake"].shift() == 1) ), 'go_down'] = 1 # Switch from 1 to 0

avoid using range(), pandas works at column level, .shift() picks up immediate previous row

Upvotes: 0

piterbarg
piterbarg

Reputation: 8219

Appreciate there is an accepted, and perfectly good, answer by @Michael O. already, but if you dislike iterating over rows as not-quite Pandas-esque, here is a solution without explicit looping over rows:

from functools import reduce
res = reduce(lambda d, _ : 
    d.fillna({'pos':d['pos'].shift(1) 
            + (d['pos'].shift(1)  == 0) * d['below_lo'] 
            + (d['pos'].shift(1)  == 1) * d['above_hi']}), 
        range(len(df)), df)
res

produces

      below_lo    above_hi    pos
--  ----------  ----------  -----
 0           0           0      0
 1           1           0      1
 2           0          -1      0
 3           0           0      0
 4           0          -1      0
 5           0           0      0
 6           0          -1      0
 7           0           0      0
 8           0           0      0
 9           1           0      1
10           0           0      1
11           0           0      1
12           0           0      1

It is, admittedly, somewhat less efficient and has a bit more obscure syntax. But it could be written on a single line (even if I split it over a few for clarity)!

The idea is that we can use fillna(..) function by passing the value, calculated from the previous value of 'pos' (hence shift(1)) and current values of 'below_lo' and 'above_hi'. The extra complication here is that this operation will only fill NaN with a non-NaN for the row just below the one with non-NaN value. Hence we need to apply this function repeatedly until all NaNs are filled, and this is where reduce comes into play

Upvotes: 0

Michael
Michael

Reputation: 5335

Use .loc:

df.loc[0, 'pos'] = 0

for i in range(len(df.below_lo)):
    if df.loc[i, 'pos'] == 0:
        df.loc[i+1, 'pos'] = df.loc[i, 'pos'] + df.loc[i, 'below_lo']
    if df.loc[i, 'pos'] == 1:
        df.loc[i+1, 'pos'] = df.loc[i, 'pos'] + df.loc[i, 'above_hi']

Upvotes: 2

Related Questions