Reputation: 85
I need some help with a regex I am writing. I have a list of words that I want to match and words that might come after them (words meaning [A-Za-z/\s]+
) I.e no parenthesis,symbols, numbers.
words = ['qtr','hard','quarter'] # keywords that must exist
test=['id:12345 cli hard/qtr Mix',
'id:12345 cli qtr 90%',
'id:12345 cli hard (red)',
'id:12345 cli hard work','Hello world']
excepted output is
['hard/qtr Mix', 'qtr', 'hard', 'hard work', None]
What I have tried so far
re.search(r'((hard|qtr|quarter)(?:[[A-Za-z/\s]+]))',x,re.I)
Upvotes: 2
Views: 228
Reputation: 3987
Idea extracted from the existing answer and made shorter :
>>> pattern = re.compile('(('+'|'.join(words)+')([a-zA-Z/ ]*))')
>>> [pattern.search(x).group(0) if pattern.search(x) else None for x in test])
['hard/qtr Mix', 'qtr ', 'hard ', 'hard work', None]
As mentioned in comment :
But it is quite inefficient, because it needs to search for same pattern twice, once for
pattern.search(x).group(0)
and the other one for ifpattern.search(x)
, and list-comprehension is not the best way to go about in such scenarios.
We can try this to overcome that issue :
>>> [v.group(0) if v else None for v in (pattern.search(x) for x in test)]
['hard/qtr Mix', 'qtr ', 'hard ', 'hard work', None]
Upvotes: 1
Reputation: 18466
The problem with the pattern you have i.e.'((hard|qtr|quarter)(?:[[A-Za-z/\s]+]))',
you have \s
inside squared brackets []
which means to match the characters individually i.e. either \
or s
, instead, you can just use space character i.e.
You can join all the words in words
list by |
to create the pattern '((qtr|hard|quarter)([a-zA-Z/ ]*))'
, then search for the pattern in each of strings in the list, if the match is found, take the group 0 and append it to the resulting list, else, append None
:
pattern = re.compile('(('+'|'.join(words)+')([a-zA-Z/ ]*))')
result = []
for x in test:
groups = pattern.search(x)
if groups:
result.append(groups.group(0))
else:
result.append(None)
OUTPUT:
result
['hard/qtr Mix', 'qtr ', 'hard ', 'hard work', None]
And since you are including the space characters, you may end up with some values that has space at the end, you can just strip off the white space characters later.
Upvotes: 1
Reputation: 2363
You can put all needed words in or
expression and put your word definition after that
import re
words = ['qtr','hard','quarter']
regex = r"(" + "|".join(words) + ")[A-Za-z\/\s]+"
p = re.compile(regex)
test=['id:12345 cli hard/qtr Mix(qtr',
'id:12345 cli qtr 90%',
'id:12345 cli hard (red)',
'id:12345 cli hard work','Hello world']
for string in test:
result = p.search(string)
if result is not None:
print(p.search(string).group(0))
else:
print(result)
Output:
hard/qtr Mix
qtr
hard
hard work
None
Upvotes: 0