Reputation: 2131
Good morning
given a dataframe that contains text data such as:
df = pandas.DataFrame({
'a':['first', 'second', 'third'],
'b':['null', 'third', 'first']})
I can select rows that contain the word 'first'
by:
df.a.str.contains('first') | df.b.str.contains('first')
which would yield
0 True
1 False
2 True
dtype: bool
To apply the same condition to to dozens of columns I could use isin
, but it seems not to work if I need to substitute 'first'
with a regex, as in regex = '(?=.*first)(?=.*second)'
.
Is there a more pythonic and elegant way to select on multiple columns, rather than just concatenating several single-column df.<column_name>.str.contains(regex)
conditions with |
in the code? Thanks
Upvotes: 1
Views: 2600
Reputation: 16144
Why don't we use applymap
on the entire data frame. This will be different than working the columns but would make it easier for your to apply if-else conditions to (I hope):
In [62]: l = ['first', 'second']
In [63]: df
Out[63]:
a b
0 first null
1 second third
2 third first
In [64]: df.appl
df.apply df.applymap
In [64]: df.applymap(lambda v: True if v in l else False)
Out[64]:
a b
0 True False
1 True False
2 False True
(Thanks to @Pythonic for this update)
We can supply regex in applymap
like so:
regex = '(^fi)'
df.applymap(lambda v: bool(re.search(regex, v)))
## -- End pasted text --
Out[38]:
a b
0 True False
1 False False
2 False True
Following example is with re.flags enabled:
In [44]: df = pandas.DataFrame({
....: 'a':['First', 'second', 'NULL'],
....: 'b':['null', 'third', 'first']})
In [45]: regex = re.compile('(^fi)', flags=re.IGNORECASE)
In [46]: df.applymap(lambda v: bool(re.search(regex_ignore_case, v)))
Out[46]:
a b
0 True False
1 False False
2 False True
Upvotes: 2