Reputation: 8628
How can I find the sequence of values AAA
, BBB
among columns COL_1
, COL_2
, COL_3
, ....,COL_X
(where X
is some number, e.g. 200, so enumeration is not a good solution)?
For example, if there is a dataframe df
, then the row 1 and 4 should be the output:
df=
ID COL_1 COL_2 COL_3 COL_4
1 AAA BBB CCC DDD
2 DDD AAA CCC BBB
3 BBB AAA DDD CCC
4 CCC AAA BBB DDD
Upvotes: 2
Views: 63
Reputation: 76917
You could join the relevant columns of row as string and then search for AAA,BBB
pattern.
In [152]: df.filter(regex='COL_').apply(lambda x: 'AAA,BBB' in ','.join(x), axis=1)
Out[152]:
0 True
1 False
2 False
3 True
dtype: bool
If you numeric values, convert them to string using map
In [166]: df.apply(lambda x: 'AAA,BBB' in ','.join(map(str, x)), axis=1)
Out[166]:
0 True
1 False
2 False
3 True
dtype: bool
In [175]: df[df.apply(lambda x: 'AAA,BBB' in ','.join(map(str, x)), axis=1)]
Out[175]:
ID COL_1 COL_2 COL_3 COL_4
0 1 AAA BBB CCC DDD
3 4 CCC AAA BBB DD
Upvotes: 3
Reputation: 85442
You can go through all columns shifted by one and zipped in pairs:
def func(x):
return any(x1 == 'AAA' and x2 == 'BBB' for x1, x2 in
zip(x.slice_shift(1), x.slice_shift(-1)))
print(df[df.apply(func, axis=1)])
For this dataframe:
COL_1 COL_2 COL_3 COL_4 COL_5
ID
1 AAA BBB CCC DDD 10
2 DDD AAA CCC BBB 20
3 BBB AAA DDD CCC 30
4 CCC AAA BBB DDD 40
the output looks like this:
COL_1 COL_2 COL_3 COL_4 COL_5
ID
1 AAA BBB CCC DDD 10
4 CCC AAA BBB DDD 40
The x.slice_shift(1)
is equivalent to x[1:]
but does not copy the data.
Upvotes: 2