Reputation: 771
I'm trying to do a search with regex within two lists that have similar strings, but not the same, how to fix the fault below?
Script:
import re
list1 = [
'juice',
'potato']
list2 = [
'juice;44',
'potato;55',
'apple;66']
correlation = []
for a in list1:
r = re.compile(r'\b{}\b'.format(a), re.I)
for b in list2:
if r.search(b):
pass
else:
correlation.append(b)
print(correlation)
Output:
['potato;55', 'apple;66', 'juice;44', 'apple;66']
Desired Output:
['apple;66']
Regex:
Upvotes: 1
Views: 88
Reputation: 781139
Convert list1
into a single regexp that matches all the words. Then append the element of list2
if it doesn't match the regexp.
regex = re.compile(r'\b(?:' + '|'.join(re.escape(word) for word in ROE) + r')\b')
correlation = [a for a in list2 if not regex.search(a)]
Upvotes: 1
Reputation: 626926
You can create a single regex pattern to match terms from list1
as whole words, and then use filter
:
import re
list1 = ['juice', 'potato']
list2 = ['juice;44', 'potato;55', 'apple;66']
rx = re.compile(r'\b(?:{})\b'.format("|".join(list1)))
print( list(filter(lambda x: not rx.search(x), list2)) )
# => ['apple;66']
See the Python demo.
The regex is \b(?:juice|potato)\b
, see its online demo. The \b
is a word boundary, the regex matches juice
or potato
as whole words. filter(lambda x: not rx.search(x), list2)
removes all items from list2
that match the regex.
Upvotes: 2
Reputation: 10959
First, inner and outer for-loop must be swapped to make this work.
Then you can set a flag to False
before the inner for-loop, set it in the inner loop to True
if you found a match, after the loop add to correlation
if flag is False
yet.
This finally looks like:
import re
list1 = [
'juice',
'potato']
list2 = [
'juice;44',
'potato;55',
'apple;66']
correlation = []
for b in list2:
found = False
for a in list1:
r = re.compile(r'\b{}\b'.format(a), re.I)
if r.search(b):
found = True
if not found:
correlation.append(b)
print(correlation)
Upvotes: 1