AtotheSiv
AtotheSiv

Reputation: 473

Pandas check for row equivalence

I have a DataFrame with three columns and I would like to calculate how many of the three values were also contained in the previous row. The values are strings.

Original DF:

Date    num1    num2    num3
Y1      x       y       z
Y2      b       x       a
Y3      x       c       c
Y4      c       x       d
Y5      x       c       d

Needed output:

Date    num1    
Y1      -       
Y2      1       <- since only x in previous row
Y3      1       <- since only x in previous
Y4      2       <- since both x and c in previous 
Y5      3       <- since all three in previous row

Any thoughts?

Upvotes: 0

Views: 174

Answers (1)

ZJS
ZJS

Reputation: 4051

Typically when comparing rows you want to use the shift method

[90]:

rel = df.set_index('Date')
shifted = rel.shift()

rel.apply(lambda x:x.isin(shifted.loc[x.name]).sum(),axis=1)
Out[90]:
Date
Y1      0
Y2      1
Y3      1
Y4      2
Y5      3
dtype: int64

Upvotes: 2

Related Questions