Reputation: 174
I have strings in a text file with more than 2000 lines, like:
cool.add.come.ADD_COPY
add.cool.warm.ADD_IN
warm.cool.warm.MINUS
cool.add.go.MINUS_COPY
I have a list of more than 200 matching words, like:
store=['ADD_COPY','add.cool.warm.ADD_IN', 'warm.cool.warm.MINUS', 'MINUS_COPY']
I'm using regular expression in the code
def all(store, file):
lst=[]
for match in re.finditer(r'[\w.]+', file):
words = match.group()
if words in store:
lst.append(words)
return lst
Then I check in a loop for requirement.
Output I'm getting:
add.cool.warm.ADD_IN
warm.cool.warm.MINUS
If I change the identifiers to \w+
then I get only:
ADD_COPY
MINUS_COPY
Required output:
add.cool.warm.ADD_IN
warm.cool.warm.MINUS
ADD_COPY
MINUS_COPY
Upvotes: 3
Views: 105
Reputation: 626929
It appears you want to get the results using a mere list comprehension:
results = set([item for item in store if item in text])
If you need a regex (in case you plan to match whole words only, or match your store
items only in specific contexts), you may get the matches using
import re
text="""cool.add.come.ADD_COPY
add.cool.warm.ADD_IN
warm.cool.warm.MINUS
cool.add.go.MINUS_COPY"""
store=['ADD_COPY','add.cool.warm.ADD_IN', 'warm.cool.warm.MINUS', 'MINUS_COPY']
rx="|".join(sorted(map(re.escape, store), key=len, reverse=True))
print(re.findall(rx, text))
The regex will look like
add\.cool\.warm\.ADD_IN|warm\.cool\.warm\.MINUS|MINUS_COPY|ADD_COPY
See the regex demo, basically, all your store
items with escaped special characters and sorted by length in the descending order.
Upvotes: 3