Reputation: 347
I am trying to check whether a certain list includes elements of another list.
I am using the following line of code:
check = df_1['website'].str.contains(df_2['website'].tolist()[i])
The problem that I am facing now is that I receive false positives, if the first df partially includes the strings in the second one.
For example I am looking to find if the following string in df_2['website'] is contained in df_1['website']:
sample_text_to_check
Since df_1['website'] contains the following string:
text_to_check
It results in a positive match. I would like to check for exact matches only (i.e. the entire string is matched and not only some letters within it.
How can I do that? The lists is 200k lines long and contains many different strings.
Upvotes: 0
Views: 408
Reputation: 521073
You could just place ^
and $
boundary markers around the string:
check = df_1['website'].str.contains(r'^' + df_2['website'].tolist()[i] + r'$')
Upvotes: 1