Reputation: 2061
I am looking to create a new column in a dataframe based on the values seen in the next 2 rows. Specifically, if any values in the next 2 rows are below 4, then I want the new value in the current row to be 0 (and if all values in the next 2 rows are above 4 then I want the new value in the current row to be 1).
>>> df = pandas.DataFrame({"A": [5,6,7,3,2]})
>>> df
A
0 5
1 6
2 7
3 8
4 2
>>> desired_result = pandas.DataFrame({"A": [5,6,7,8,2], "new": [1,1,0,0,0]})
>>> desired_result
A new
0 5 1
1 6 1
2 7 0
3 8 0
4 2 0
Where you can see that in the "desired_result" the first value is 1 because 6 and 7 are both > 4 (and hte same logic applies) until in the third row the new value becomes 0 because when we look ahead to the next two rows (8,2) then we see that 2 is < 4 so the value becomes 0.
I have been trying to use the apply function but I cannot figure out how to pass along the next 2 row values as inputs.
I have found lots of help on this site about comparing across columns, but cannot figure out how to "look ahead" like I described.
Thanks for the help!
Upvotes: 2
Views: 498
Reputation: 109546
You can set the new
value to one and then use loc
together with shift
and lt
(less than) to set the appropriate values to zero.
df = pd.DataFrame({"A": [5, 6, 7, 8, 2]})
df['new'] = 1
df.loc[(df.A.shift(-1).lt(4)) | (df.A.shift(-2).lt(4)), 'new'] = 0
# The last value does not have any future observations and should be set to zero.
df.new.iat[-1] = 0
>>> df
A new
0 5 1
1 6 1
2 7 0
3 8 0
4 2 0
To expand to the next 8 rows instead of 2:
nrows = 8
df.loc[eval(" | ".join("df.A.shift(-{0}).lt(4)".format(n)
for n in range(1, nrows + 1))), 'new'] = 0
Upvotes: 3