Reputation: 534
Here is the regex I have so far:
^(?=.*(option1|option2))(?=.*(option3|option4))(?=.*(option5|option6))(?=.*(option7|option8))(?=.*(option9|option10)).*$
I am not hip on the regex language so I'll make my own definitions:
category 1 is (option1|option2), category 2 is (option3|option4), category 3 is (option5|option6), etc.
I would like to capture values where at least 1 option from 3 or more of the categories is found, like this:
some text option3 some more text option8 some more text option1
OR
some text option3 some more text option8 some more text option1 some more text option6
I don't want to capture values like this:
some text option3 some more text option8 - only 2 categories are represented
OR
some text option3 some more text option4 some more text option1 (options 3 and 4 are from the same category)
The options can appear in any order in the text, so that is why I was using the positive lookahead, but I don't know how to put a quantifier on multiple positive lookaheads.
As far as regex engine goes, I have to use a front end UI that is powered by python in the background. I can only use regex, I don't have the ability to use any other python functions. Thanks!
Upvotes: 4
Views: 1573
Reputation: 75242
Here's a regex that does what you want (in VERBOSE
mode):
^
(?= .* (?: option1 | option2 ) () )?
(?= .* (?: option3 | option4 ) () )?
(?= .* (?: option5 | option6 ) () )?
(?= .* (?: option7 | option8 ) () )?
(?= .* (?: option9 | option10 ) () )?
.*$
(?: \1\2\3 | \1\2\4 | \1\2\5 | \1\3\4 | \1\3\5 |
\1\4\5 | \2\3\4 | \2\3\5 | \2\4\5 | \3\4\5 )
The empty groups serve as check boxes: if the enclosing lookahead doesn't succeed, a backreference to that group won't succeed. The non-capturing group at the end contains all possible combinations of three out of five backreferences.
The limitations of this approach are obvious; you need only add one more set of option
s for it to get completely out of hand. I think you'd be better off with a non-regex solution.
Upvotes: 1
Reputation: 107297
I don't think this is implementable with regex, or if it is (maybe in some steps), it's not a proper way to go.
Instead you can store your options in a set like:
options = {(option1, option2), (option3, option4), (option5, option6), (option7, option8), (option9, option10)}
Then check the membership like following:
if sum(i in my_text or j in my_text for i, j in options) >= 3:
# do something
Here is a Demo:
>>> s1 = "some text option8 some more text option3 some more text option1"
>>> s2 = "some text option3 some more text option4 some more text option1"
>>> s3 = "some text option3 some more text option8"
>>>
>>> options = {('option1', 'option2'), ('option3', 'option4'), ('option5', 'option6'), ('option7', 'option8'), ('option9', 'option10')}
>>>
>>> sum(i in s1 or j in s1 for i, j in options)
3
>>> sum(i in s2 or j in s2 for i, j in options)
2
>>> sum(i in s3 or j in s3 for i, j in options)
2
Upvotes: 1