Reputation: 179
I would like to check if pandas dataframe column id
contains the following substrings '.F1', '.N1', '.FW', '.SP'
.
I am currently using the following codes:
searchfor = ['.F1', '.N1', '.FW', '.SP']
mask = (df["id"].str.contains('|'.join(searchfor)))
The id
column looks like such:
ID
0 F611B4E369F1D293B5
1 10302389527F190F1A
I am actually looking to see if the id
column contains the four substrings starting with a .
. For some reasons, F1
will be filtered out. In the current example, it does not have .F1
. I would really appreciate if someone would let me know how to solve this particular issue. Thank you so much.
Upvotes: 3
Views: 530
Reputation: 23217
You can use re.escape()
to escape the regex meta-characters in the following way such that you don't need to escape every string in the word list searchfor
(no need to change the definition of searchfor
):
import re
searchfor = ['.F1', '.N1', '.FW', '.SP'] # no need to escape each string
pattern = '|'.join(map(re.escape, searchfor)) # use re.escape() with map()
mask = (df["id"].str.contains(pattern))
re.escape()
will escape each string for you:
print(pattern)
'\\.F1|\\.N1|\\.FW|\\.SP'
Upvotes: 1