Dinosaurius
Dinosaurius

Reputation: 8628

How to extract rows with the given sequence of values for 2 subsequent columns?

How can I find the sequence of values AAA, BBB among columns COL_1, COL_2, COL_3, ....,COL_X (where X is some number, e.g. 200, so enumeration is not a good solution)?

For example, if there is a dataframe df, then the row 1 and 4 should be the output:

df=
ID    COL_1   COL_2   COL_3   COL_4
1     AAA     BBB     CCC     DDD
2     DDD     AAA     CCC     BBB
3     BBB     AAA     DDD     CCC
4     CCC     AAA     BBB     DDD

Upvotes: 2

Views: 63

Answers (2)

Zero
Zero

Reputation: 76917

You could join the relevant columns of row as string and then search for AAA,BBB pattern.

In [152]: df.filter(regex='COL_').apply(lambda x: 'AAA,BBB' in ','.join(x), axis=1)
Out[152]:
0     True
1    False
2    False
3     True
dtype: bool

If you numeric values, convert them to string using map

In [166]: df.apply(lambda x: 'AAA,BBB' in ','.join(map(str, x)), axis=1)
Out[166]:
0     True
1    False
2    False
3     True
dtype: bool

In [175]: df[df.apply(lambda x: 'AAA,BBB' in ','.join(map(str, x)), axis=1)]
Out[175]:
   ID COL_1 COL_2 COL_3 COL_4
0   1   AAA   BBB   CCC   DDD
3   4   CCC   AAA   BBB    DD

Upvotes: 3

Mike Müller
Mike Müller

Reputation: 85442

You can go through all columns shifted by one and zipped in pairs:

def func(x):
    return any(x1 == 'AAA' and x2 == 'BBB' for x1, x2 in 
               zip(x.slice_shift(1), x.slice_shift(-1)))

print(df[df.apply(func, axis=1)])

For this dataframe:

   COL_1 COL_2 COL_3 COL_4  COL_5
ID                               
1    AAA   BBB   CCC   DDD     10
2    DDD   AAA   CCC   BBB     20
3    BBB   AAA   DDD   CCC     30
4    CCC   AAA   BBB   DDD     40

the output looks like this:

   COL_1 COL_2 COL_3 COL_4  COL_5
ID                               
1    AAA   BBB   CCC   DDD     10
4    CCC   AAA   BBB   DDD     40

The x.slice_shift(1) is equivalent to x[1:] but does not copy the data.

Upvotes: 2

Related Questions