MichaelA
MichaelA

Reputation: 1986

Using str.contains to look for two substrings with pandas in python

I'm afraid the solution is obvious or the question a duplicate, but I couldn't find an answer yet: I have a pandas data frame that contains long strings and I need two strings to be matched at the same time. I found the "or" version multiple time but I didn't find the "and" solution yet.

Please assume the following data frame where the interesting information "element type" and subpart type" are separated by a random in between element:

import pandas as pd
data = pd.DataFrame({"col1":["element1_random_string_subpartA"
                           , "element2_ran_str_subpartA"
                           , "element1_some_text_subpartB"
                           , "element2_some_other_text_subpartB"]})

I'd now like to filter for all lines that contain element1 and subpartA.

data.col1.str.contains("element1|subpartA")

return a data frame

True 
True
True
False

which is the expected result. But I need an "And" combination and

data.col1.str.contains("element1&subpartA")

returns

False
False
False
False

although I'd expect

True
False 
False
False

Upvotes: 1

Views: 164

Answers (1)

jezrael
jezrael

Reputation: 862841

Regex and is not easy:

m = data.col1.str.contains(r'(?=.*subpartA)(?=.*element1)')  

Simplier is chain both conditions with & for bitwise AND:

m = data.col1.str.contains("subpartA") & data.col1.str.contains("element1")
print (m)
0     True
1    False
2    False
3    False
Name: col1, dtype: bool

Upvotes: 1

Related Questions