Reputation: 11
I have a double challenge. First, I want to match lines that contain two (or eventually more) specified words within certain distance in whatever order.
Using lookaround I manage to select lines matching two or more words, regardless of the order within they occur. I can also easily add more words to be found in the same line, so it this can also be applied without much effort when more word must occur in order to be selected. The disadvantage is that can't detail the maximal distance between them.
^(?=.*\john)(?=.*\jack).*$
By using the pipe operator I can detail both orders in which the terms may occur as well as the accepted distance between them, but when more words should be matched the code becomes lengthy and errorsensitive.
jack.{0,100}john|john.{0,100}jack
Is there a way to combine the respective advantages of both approaches in one regular expression?
Second, ideally I would like that only 'jack' and 'john' (and are selected in the line but not the whole line. Is there a possibility to do this all at once?
Upvotes: 1
Views: 98
Reputation: 174874
For this case, you have to use the second approach. But it can't be possible with regex alone.. You have to ask for language tools help like paste
in-order to build a regex (given in the second format).
In python, I would do like below to create a long regex.
>>> def create_reg(lis):
out = []
for i in lis:
out.append(''.join(i) + '|' + ''.join([i[2],i[1], i[0]]))
return '(?:' + '|'.join(out) + ')'
>>> lst = [('john', '{0,100}', 'jack'), ('foo', '{0,100}', 'bar')]
>>> create_reg(lst)
'(?:john{0,100}jack|jack{0,100}john|foo{0,100}bar|bar{0,100}foo)'
>>>
Upvotes: 1