Luis Henrique
Luis Henrique

Reputation: 771

Searching for similar values within a regex string

I'm trying to do a search with regex within two lists that have similar strings, but not the same, how to fix the fault below?

Script:

import re

list1 = [
'juice',
'potato']

list2 = [
'juice;44',
'potato;55',
'apple;66']

correlation = []
for a in list1:
    r = re.compile(r'\b{}\b'.format(a), re.I)
    for b in list2:
        if r.search(b):
            pass
        else:
            correlation.append(b)

print(correlation)

Output:

['potato;55', 'apple;66', 'juice;44', 'apple;66']

Desired Output:

['apple;66']

Regex:

enter image description here

Upvotes: 1

Views: 88

Answers (3)

Barmar
Barmar

Reputation: 781139

Convert list1 into a single regexp that matches all the words. Then append the element of list2 if it doesn't match the regexp.

regex = re.compile(r'\b(?:' + '|'.join(re.escape(word) for word in ROE) + r')\b')
correlation = [a for a in list2 if not regex.search(a)]

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626926

You can create a single regex pattern to match terms from list1 as whole words, and then use filter:

import re

list1 = ['juice', 'potato']
list2 = ['juice;44', 'potato;55', 'apple;66']

rx = re.compile(r'\b(?:{})\b'.format("|".join(list1)))
print( list(filter(lambda x: not rx.search(x), list2)) )
# => ['apple;66']

See the Python demo.

The regex is \b(?:juice|potato)\b, see its online demo. The \b is a word boundary, the regex matches juice or potato as whole words. filter(lambda x: not rx.search(x), list2) removes all items from list2 that match the regex.

Upvotes: 2

Michael Butscher
Michael Butscher

Reputation: 10959

First, inner and outer for-loop must be swapped to make this work.

Then you can set a flag to False before the inner for-loop, set it in the inner loop to True if you found a match, after the loop add to correlation if flag is False yet.

This finally looks like:

import re

list1 = [
'juice',
'potato']

list2 = [
'juice;44',
'potato;55',
'apple;66']

correlation = []
for b in list2:
    found = False

    for a in list1:
        r = re.compile(r'\b{}\b'.format(a), re.I)
        if r.search(b):
            found = True

    if not found:
        correlation.append(b)

print(correlation)

Upvotes: 1

Related Questions