Reputation: 81
I am trying to write a regex pattern that will match any string containing the words '28', 'bonus', and 'day'.
At the moment I have come up with this:
(bonus|(days|day)|(28th|28)|twenty[ \-\t]*(eighth|eight))[ \ta-z]*(bonus|days|day|(28th|28)|twenty[ \-\t]*(eighth|eight))[ \ta-z]*(bonus|days|day|(28th|28)|twenty[ \-\t]*(eighth|eight))
You can view the results here: https://regex101.com/r/oOcGqk/8
The trouble I am having is that any word can be used multiple times, and still be matched. For example: 'day day bonus', 'bonus bonus bonus'. How can I exclude strings that use any of these words ('28', 'bonus', 'day') more than once?
Upvotes: 0
Views: 78
Reputation: 336158
With a decent regex engine, you could make use of a nice trick:
^ # Start of string
(?=(?:(?!bonus).)*bonus()(?:(?!bonus).)*$)
# Explanation: This lookahead assertion makes sure that "bonus" occurs exactly once
# in the string. It doesn't actually match any text, it just "looks ahead" to see if
# that condition is met. However, it contains an empty capturing group "()" that only
# participates in the match if the lookahead assertion succeeds. We can check this later.
(?=(?:(?!days?).)*days?()(?:(?!days?).)*$)
(?=(?:(?!28(?:th)?|twenty-eighth?).)*(?:28(?:th)?|twenty-eighth?)()(?:(?!28(?:th)?|twenty-eighth?).)*$)
[\w\s]* # Match a string that only contains alnum character or whitespace
\1\2\3 # Assert that all three words participated in the match
$ # End of string.
You can test this here
In JavaScript, you'll have to spell out all possible permutations. Unfortunately, JS doesn't even allow verbose regexes, so it's going to be monstrous.
Just as a starting point: The following regex will match strings that contain bonus
, days
and 28
exactly once, but it only allows them in the order "bonus
, days
and 28
" or "days
,bonus
and 28
". You'd need to add the other four permutations to get a complete regex (and a complete mess). Do this programmatically, not with a regex.
^(?:(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*bonus(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*days?(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*(?:28(?:th)?|twenty-eighth?)(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*|(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*days?(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*bonus(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*(?:28(?:th)?|twenty-eighth?)(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*)$
Test it here. You have been warned.
Upvotes: 1
Reputation: 1865
I think this regex expresion is solution:
(?=.*bonus)(?=.*day)(?=.*28|twenty\s*-?\s*eight).*
Upvotes: 1