Reputation: 534
I would like to create a 3rd column in my dataframe, which depends on both the new and existing columns in the previous row.
This new column should start at 0.
I would like my 3rd column to start at 0.
Its next value is its previous value plus df.below_lo[i]
(if the previous value was 0).
If its previous value was 1, its next value is its previous value plus df.above_hi[i]
.
I think I have two issues: how to initiate this 3rd column and how to make it dependent on itself.
import pandas as pd
import math
data = {'below_lo': [0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
'above_hi': [0, 0, -1, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0]}
df = pd.DataFrame(data)
df['pos'] = math.nan
df['pos'][0] = 0
for i in range(len(df.below_lo)):
if df.pos[i] == 0:
df.pos[i+1] = df.pos[i] + df.below_lo[i]
if df.pos[i] == 1:
df.pos[i+1] = df.pos[i] + df.above_hi[i]
print(df)
The desired output is:
below_lo above_hi pos
0 0.0 0.0 0.0
1 1.0 0.0 0.0
2 0.0 -1.0 1.0
3 0.0 0.0 0.0
4 0.0 -1.0 0.0
5 0.0 0.0 0.0
6 0.0 -1.0 0.0
7 0.0 0.0 0.0
8 0.0 0.0 0.0
9 1.0 0.0 0.0
10 0.0 0.0 1.0
11 0.0 0.0 1.0
12 0.0 0.0 1.0
13 NaN NaN 1.0
The above code produces the correct output, except I am also getting a few of these error messages:
A value is trying to be set on a copy of a slice from a DataFrame
How do I clean this code up so that it runs without throwing this warning? ?
Upvotes: 2
Views: 2172
Reputation: 123
A solution that works faster is
df_ser['go_up'] = 0
df_ser['go_down'] = 0
df_ser.loc[(((df_ser["awake"] == 1) & (df_ser["awake"].shift() == 0) ), 'go_up'] = 1 # Go from 0 to 1
df_ser.loc[(((df_ser["awake"] == 0) & (df_ser["awake"].shift() == 1) ), 'go_down'] = 1 # Switch from 1 to 0
avoid using range()
, pandas works at column level,
.shift()
picks up immediate previous row
Upvotes: 0
Reputation: 8219
Appreciate there is an accepted, and perfectly good, answer by @Michael O. already, but if you dislike iterating over rows as not-quite Pandas-esque, here is a solution without explicit looping over rows:
from functools import reduce
res = reduce(lambda d, _ :
d.fillna({'pos':d['pos'].shift(1)
+ (d['pos'].shift(1) == 0) * d['below_lo']
+ (d['pos'].shift(1) == 1) * d['above_hi']}),
range(len(df)), df)
res
produces
below_lo above_hi pos
-- ---------- ---------- -----
0 0 0 0
1 1 0 1
2 0 -1 0
3 0 0 0
4 0 -1 0
5 0 0 0
6 0 -1 0
7 0 0 0
8 0 0 0
9 1 0 1
10 0 0 1
11 0 0 1
12 0 0 1
It is, admittedly, somewhat less efficient and has a bit more obscure syntax. But it could be written on a single line (even if I split it over a few for clarity)!
The idea is that we can use fillna(..)
function by passing the value, calculated from the previous value of 'pos' (hence shift(1)
) and current values of 'below_lo' and 'above_hi'. The extra complication here is that this operation will only fill NaN
with a non-NaN for the row just below the one with non-NaN value. Hence we need to apply this function repeatedly until all NaNs are filled, and this is where reduce
comes into play
Upvotes: 0
Reputation: 5335
Use .loc
:
df.loc[0, 'pos'] = 0
for i in range(len(df.below_lo)):
if df.loc[i, 'pos'] == 0:
df.loc[i+1, 'pos'] = df.loc[i, 'pos'] + df.loc[i, 'below_lo']
if df.loc[i, 'pos'] == 1:
df.loc[i+1, 'pos'] = df.loc[i, 'pos'] + df.loc[i, 'above_hi']
Upvotes: 2