blue-sky
blue-sky

Reputation: 53806

Apply function to every column value of each row using pandas

For this dataframe :

columns = ['A','B', 'C']
data = np.array([[1,2,2] , [4,5,4], [7,8,18]])
df2 = pd.DataFrame(data,columns=columns)
df2['C']

If the difference between consecutive rows for column C is <= 2 then the previous and current row should be returned. So I'm attempting to filter out rows where difference for previous row > 2.

So expecting these array values to be returned :

    [1,2,2] 

    [4,5,4]

    [7,8,18]

I'm attempting to implement this functionality using the shift function :

df2[(df2.A - df2.shift(1).A >= 2)]

The result of which is :

    A   B   C
1   4   5   4
2   7   8   18

I think need to apply function to each row in order to achieve this ?

Update :

Alternative use case :

columns = ['A','B', 'C']
data = np.array([[1,2,2] , [2,5,3], [7,8,16]])
df2 = pd.DataFrame(data,columns=columns)
df2[df2.A.diff().shift(-1) >= 2]

Returned is :

    A   B   C
1   2   5   3

but expecting

    A   B   C
1   2   5   3
1   7   8   16

so in this case expecting the next and current row to be returned as difference between 2 & 8 in 2 5 3 & 8 8 18 is > 2

Update 2 :

Edge case : if the last value being compared is < 2 then the row is ignored

columns = ['A','B', 'C']
data = np.array([[2,2,2] , [3,5,3], [5,8,16], [6,8,16]])
df2 = pd.DataFrame(data,columns=columns)

df2[df2.A.diff().shift(-1).ffill() >= 2]

returns :

A   B   C
1   3   5   3

Upvotes: 1

Views: 132

Answers (1)

jezrael
jezrael

Reputation: 862641

I believe you need diff with shift and last NaNs replace by ffill:

a = df2[df2.A.diff().shift(-1).ffill() >= 2]
#same as
a = df2[df2.A.diff().shift(-1).ffill().ge(2)]
print (a)

   A  B   C
1  2  5   3
2  7  8  16

Upvotes: 1

Related Questions