oren_isp
oren_isp

Reputation: 779

pandas dataframe filter by sequence of values in a specific column

I have a dataframe

A B C

1 2 3

2 3 4

3 8 7

I want to take only rows where there is a sequence of 3,4 in columns C (in this scenario - first two rows)

What will be the best way to do so?

Upvotes: 3

Views: 1211

Answers (2)

jezrael
jezrael

Reputation: 863166

You can use rolling for general solution working with any pattern:

pat = np.asarray([3,4])
N = len(pat)

mask= (df['C'].rolling(window=N , min_periods=N)
              .apply(lambda x: (x==pat).all(), raw=True)
              .mask(lambda x: x == 0) 
              .bfill(limit=N-1)
              .fillna(0)
              .astype(bool))

df = df[mask]
print (df)
   A  B  C
0  1  2  3
1  2  3  4

Explanation:

  • use rolling.apply and test pattern
  • replace 0s to NaNs by mask
  • use bfill with limit for filling first NANs values by last previous one
  • fillna NaNs to 0
  • last cast to bool by astype

Upvotes: 4

Zero
Zero

Reputation: 76947

Use shift

In [1085]: s = df.eq(3).any(1) & df.shift(-1).eq(4).any(1)

In [1086]: df[s | s.shift()]
Out[1086]:
   A  B  C
0  1  2  3
1  2  3  4

Upvotes: 2

Related Questions