Reputation: 133
Data:
day cost new_column
1 1 0
2 1 0
3 5 enter position
4 3 stay in position
5 10 stay in position
6 1 exit position
7 1 0
Hi, I'm wondering if there is a way to reference a previous row in a calculated column without looping and using loc/iloc. In the above example, I want to calculate this 'new_column'. Once the cost hits 5, I want to enter the position. Once I am in a position, I want to be able to check on the next line if I am already in a position and check that the price is not 1. If so then I stay in the position. The first row I hit where the cost is 1 and the previous "new_column" is 'stay in position' I want to make 'exit position'. Then the next row with 1, 'new_column' should go back to zero.
How I am solving this now is by looping over each row, looking at the cost on row i and the status of new_column on row i-1, then inserting the result in new_column on row i.
This takes a while on large data sets and I would like to find a more efficient way to do things. I looked into list(map()), but i don't see a way to reference a previous row, because I don't think that data will have been created yet to reference. Any ideas?
Thank you
Upvotes: 0
Views: 65
Reputation: 815
Hey as smj suggested one option is using shift.
day = list(range(1,8))
cost = [1,1,5,3,10,1,1]
import pandas as pd
import numpy as np
df = pd.DataFrame({'day':day,'cost':cost}, columns = ['day', 'cost'])
print(df)
df['new'] = np.where(df['cost']> 1, np.where(
df['cost'].shift(-1) >=1,
'stay','a'
),
np.where(
df['cost'].shift()>1, 'exit', 0
)
)
print(df)
Upvotes: 2