colio303
colio303

Reputation: 81

Regex: Match list of words without reusing previously matched words

I am trying to write a regex pattern that will match any string containing the words '28', 'bonus', and 'day'.

At the moment I have come up with this:

(bonus|(days|day)|(28th|28)|twenty[ \-\t]*(eighth|eight))[ \ta-z]*(bonus|days|day|(28th|28)|twenty[ \-\t]*(eighth|eight))[ \ta-z]*(bonus|days|day|(28th|28)|twenty[ \-\t]*(eighth|eight))

You can view the results here: https://regex101.com/r/oOcGqk/8

The trouble I am having is that any word can be used multiple times, and still be matched. For example: 'day day bonus', 'bonus bonus bonus'. How can I exclude strings that use any of these words ('28', 'bonus', 'day') more than once?

Upvotes: 0

Views: 78

Answers (2)

Tim Pietzcker
Tim Pietzcker

Reputation: 336158

With a decent regex engine, you could make use of a nice trick:

^     # Start of string
(?=(?:(?!bonus).)*bonus()(?:(?!bonus).)*$) 
# Explanation: This lookahead assertion makes sure that "bonus" occurs exactly once 
# in the string. It doesn't actually match any text, it just "looks ahead" to see if 
# that condition is met. However, it contains an empty capturing group "()" that only 
# participates in the match if the lookahead assertion succeeds. We can check this later.
(?=(?:(?!days?).)*days?()(?:(?!days?).)*$)
(?=(?:(?!28(?:th)?|twenty-eighth?).)*(?:28(?:th)?|twenty-eighth?)()(?:(?!28(?:th)?|twenty-eighth?).)*$)
[\w\s]*  # Match a string that only contains alnum character or whitespace
\1\2\3   # Assert that all three words participated in the match
$        # End of string.

You can test this here

In JavaScript, you'll have to spell out all possible permutations. Unfortunately, JS doesn't even allow verbose regexes, so it's going to be monstrous.

Just as a starting point: The following regex will match strings that contain bonus, days and 28 exactly once, but it only allows them in the order "bonus, days and 28" or "days,bonus and 28". You'd need to add the other four permutations to get a complete regex (and a complete mess). Do this programmatically, not with a regex.

^(?:(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*bonus(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*days?(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*(?:28(?:th)?|twenty-eighth?)(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*|(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*days?(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*bonus(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*(?:28(?:th)?|twenty-eighth?)(?:(?!bonus|days?|28(?:th)?|twenty-eighth?).)*)$

Test it here. You have been warned.

Upvotes: 1

Maciej Kozieja
Maciej Kozieja

Reputation: 1865

I think this regex expresion is solution:

(?=.*bonus)(?=.*day)(?=.*28|twenty\s*-?\s*eight).*

Upvotes: 1

Related Questions