How to find matching words using regex?

I have strings in a text file with more than 2000 lines, like:

cool.add.come.ADD_COPY
add.cool.warm.ADD_IN
warm.cool.warm.MINUS
cool.add.go.MINUS_COPY

I have a list of more than 200 matching words, like:

store=['ADD_COPY','add.cool.warm.ADD_IN', 'warm.cool.warm.MINUS', 'MINUS_COPY']

I'm using regular expression in the code

def all(store, file):
    lst=[]
    for match in re.finditer(r'[\w.]+', file):
        words = match.group()
            if words in store:
                lst.append(words) 
    return lst 

Then I check in a loop for requirement.

Output I'm getting:

add.cool.warm.ADD_IN
warm.cool.warm.MINUS

If I change the identifiers to \w+ then I get only:

ADD_COPY
MINUS_COPY

Required output:

add.cool.warm.ADD_IN
warm.cool.warm.MINUS   
ADD_COPY
MINUS_COPY

Upvotes: 3

Views: 105

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626929

It appears you want to get the results using a mere list comprehension:

results = set([item for item in store if item in text])

If you need a regex (in case you plan to match whole words only, or match your store items only in specific contexts), you may get the matches using

import re
text="""cool.add.come.ADD_COPY
add.cool.warm.ADD_IN
warm.cool.warm.MINUS
cool.add.go.MINUS_COPY"""

store=['ADD_COPY','add.cool.warm.ADD_IN', 'warm.cool.warm.MINUS', 'MINUS_COPY']
rx="|".join(sorted(map(re.escape, store), key=len, reverse=True))
print(re.findall(rx, text))

The regex will look like

add\.cool\.warm\.ADD_IN|warm\.cool\.warm\.MINUS|MINUS_COPY|ADD_COPY

See the regex demo, basically, all your store items with escaped special characters and sorted by length in the descending order.

Upvotes: 3

Related Questions