How to find matching words using regex?

Question

I have strings in a text file with more than 2000 lines, like:

cool.add.come.ADD_COPY
add.cool.warm.ADD_IN
warm.cool.warm.MINUS
cool.add.go.MINUS_COPY

I have a list of more than 200 matching words, like:

store=['ADD_COPY','add.cool.warm.ADD_IN', 'warm.cool.warm.MINUS', 'MINUS_COPY']

I'm using regular expression in the code

def all(store, file):
    lst=[]
    for match in re.finditer(r'[\w.]+', file):
        words = match.group()
            if words in store:
                lst.append(words) 
    return lst

Then I check in a loop for requirement.

Output I'm getting:

add.cool.warm.ADD_IN
warm.cool.warm.MINUS

If I change the identifiers to \w+ then I get only:

ADD_COPY
MINUS_COPY

Required output:

add.cool.warm.ADD_IN
warm.cool.warm.MINUS   
ADD_COPY
MINUS_COPY

Wiktor Stribiżew · Accepted Answer

It appears you want to get the results using a mere list comprehension:

results = set([item for item in store if item in text])

If you need a regex (in case you plan to match whole words only, or match your store items only in specific contexts), you may get the matches using

import re
text="""cool.add.come.ADD_COPY
add.cool.warm.ADD_IN
warm.cool.warm.MINUS
cool.add.go.MINUS_COPY"""

store=['ADD_COPY','add.cool.warm.ADD_IN', 'warm.cool.warm.MINUS', 'MINUS_COPY']
rx="|".join(sorted(map(re.escape, store), key=len, reverse=True))
print(re.findall(rx, text))

The regex will look like

add\.cool\.warm\.ADD_IN|warm\.cool\.warm\.MINUS|MINUS_COPY|ADD_COPY

See the regex demo, basically, all your store items with escaped special characters and sorted by length in the descending order.

How to find matching words using regex?

Answers (1)

Related Questions