Reputation: 473
I have a DataFrame with three columns and I would like to calculate how many of the three values were also contained in the previous row. The values are strings.
Original DF:
Date num1 num2 num3
Y1 x y z
Y2 b x a
Y3 x c c
Y4 c x d
Y5 x c d
Needed output:
Date num1
Y1 -
Y2 1 <- since only x in previous row
Y3 1 <- since only x in previous
Y4 2 <- since both x and c in previous
Y5 3 <- since all three in previous row
Any thoughts?
Upvotes: 0
Views: 174
Reputation: 4051
Typically when comparing rows you want to use the shift method
[90]:
rel = df.set_index('Date')
shifted = rel.shift()
rel.apply(lambda x:x.isin(shifted.loc[x.name]).sum(),axis=1)
Out[90]:
Date
Y1 0
Y2 1
Y3 1
Y4 2
Y5 3
dtype: int64
Upvotes: 2