oranJess
oranJess

Reputation: 598

Update row value based on the most recent value of the previous row

Suppose I have a pandas DataFrame:

RowNum PageName OfInterest
0 home False
1 photo False
2 list True
3 photo False
4 photo False
5 photo False
6 home False
7 photo False

OfInterest value for all rows with PageName=photo should be set to True only if they follow PageName=list.

In my desired output, rows 3,4,5 will be changed, but not rows 1, 7:

RowNum PageName OfInterest
0 home False
1 photo False
2 list True
3 photo True
4 photo True
5 photo True
6 home False
7 photo False

I attempted to do this using apply() but it seems that I cannot access the most recently changed values.

def changeInterest(x):
  followsOfInterest = (x['PageName'] == 'photo') and (x['PrevOfInterest'])
  return followsOfInterest or x['OfInterest']

df['PrevOfInterest'] = df['OfInterest'].shift(-1)
df['PrevOfInterest'] = df[['PageName', 'OfInterest', 'PrevOfInterest']].apply(changeInterest, axis=1)

I know I can accomplish the same using a loop, but I would like to find a more elegant solution.

Upvotes: 1

Views: 67

Answers (1)

anky
anky

Reputation: 75080

You can try replace and ffill here , then just compare if the ffilled value is 'list'

s = df['PageName'].replace('photo',np.nan).ffill().eq('list')|df['OfInterest']
df['OfInterest'] = s

print(df)

   RowNum PageName  OfInterest
0       0     home       False
1       1    photo       False
2       2     list        True
3       3    photo        True
4       4    photo        True
5       5    photo        True
6       6     home       False
7       7    photo       False

Upvotes: 3

Related Questions