nickm
nickm

Reputation: 133

How to reference a previously calculated row in pandas without looping?

Data:

    day    cost    new_column
    1       1        0
    2       1        0
    3       5        enter position
    4       3        stay in position
    5       10       stay in position
    6       1        exit position
    7       1        0

Hi, I'm wondering if there is a way to reference a previous row in a calculated column without looping and using loc/iloc. In the above example, I want to calculate this 'new_column'. Once the cost hits 5, I want to enter the position. Once I am in a position, I want to be able to check on the next line if I am already in a position and check that the price is not 1. If so then I stay in the position. The first row I hit where the cost is 1 and the previous "new_column" is 'stay in position' I want to make 'exit position'. Then the next row with 1, 'new_column' should go back to zero.

How I am solving this now is by looping over each row, looking at the cost on row i and the status of new_column on row i-1, then inserting the result in new_column on row i.

This takes a while on large data sets and I would like to find a more efficient way to do things. I looked into list(map()), but i don't see a way to reference a previous row, because I don't think that data will have been created yet to reference. Any ideas?

Thank you

Upvotes: 0

Views: 65

Answers (1)

kaihami
kaihami

Reputation: 815

Hey as smj suggested one option is using shift.

day = list(range(1,8))
cost = [1,1,5,3,10,1,1]

import pandas as pd
import numpy as np

df = pd.DataFrame({'day':day,'cost':cost}, columns = ['day', 'cost'])
print(df)
df['new'] = np.where(df['cost']> 1, np.where(
                                             df['cost'].shift(-1) >=1,
                                             'stay','a'
                                            ),
                     np.where(
                              df['cost'].shift()>1, 'exit', 0
                             )
                    )
print(df)

Upvotes: 2

Related Questions