Reputation: 1986
I'm afraid the solution is obvious or the question a duplicate, but I couldn't find an answer yet: I have a pandas data frame that contains long strings and I need two strings to be matched at the same time. I found the "or" version multiple time but I didn't find the "and" solution yet.
Please assume the following data frame where the interesting information "element type" and subpart type" are separated by a random in between element:
import pandas as pd
data = pd.DataFrame({"col1":["element1_random_string_subpartA"
, "element2_ran_str_subpartA"
, "element1_some_text_subpartB"
, "element2_some_other_text_subpartB"]})
I'd now like to filter for all lines that contain element1 and subpartA.
data.col1.str.contains("element1|subpartA")
return a data frame
True
True
True
False
which is the expected result. But I need an "And" combination and
data.col1.str.contains("element1&subpartA")
returns
False
False
False
False
although I'd expect
True
False
False
False
Upvotes: 1
Views: 164
Reputation: 862841
Regex and
is not easy:
m = data.col1.str.contains(r'(?=.*subpartA)(?=.*element1)')
Simplier is chain both conditions with &
for bitwise AND
:
m = data.col1.str.contains("subpartA") & data.col1.str.contains("element1")
print (m)
0 True
1 False
2 False
3 False
Name: col1, dtype: bool
Upvotes: 1