shajji
shajji

Reputation: 1667

Regex: Specified words in any order

I'm not good at regex, trying to make 2 regex.

Regex1:

All specified words in any order but nothing else. (repetition allowed).

Regex2:

All specified words in any order but nothing else. (repetition not allowed).

Words:

aaa, bbb, ccc

Strings:

aaa ccc bbb
aaa ccc
aaa bbb ddd ccc
bbb aaa bbb ccc

Regex1 evaluate above strings as:

true -> all word present in any order
false -> bbb is missing
false -> unknown word 'ddd'
false -> repetition not allowed

Regex2 evaluate above strings as:

true -> all word present in any order
false -> bbb is missing
false -> unknown word 'ddd'
true -> all word present in any order and repetition is allowed

My Attempt

/^(?=.*\baaa\b)(?=.*\bbbb\b)(?=.*\bccc\b).*$/

Asking for learning purpose so please elaborate it.

Upvotes: 15

Views: 1208

Answers (4)

bobble bubble
bobble bubble

Reputation: 18490

Without repetition regex101

^(?:(aaa|bbb|ccc)(?!.*?\b\1) ?\b){3}$

And with repetition regex101

^(?=.*?\baaa)(?=.*?\bbbb)(?=.*?\bccc)(?:(aaa|bbb|ccc) ?\b)+$

Two more ideas. Regex explanation at regex101 on the right side.

Upvotes: 6

Christoph Herold
Christoph Herold

Reputation: 1809

For Regex 1:

var re = /^(?=.*?\baaa\b)(?=.*?\bbbb\b)(?=.*?\bccc\b)\b(?:aaa|bbb|ccc)\b(?: +\b(?:aaa|bbb|ccc)\b)*$/;
var res = document.getElementById('result');
res.innerText += re.test('aaa ccc bbb');
res.innerText += ', ' + re.test('aaa ccc ddd');
res.innerText += ', ' + re.test('aaa ddd bbb');
res.innerText += ', ' + re.test('ccc bbb ccc');
<div id="result"></div>

Your code already does part of the trick. Your positive lookaheads check that all words appear somewhere, however not, that they are the only words present. To achieve this, I added the circumflex (^) at the beginning to detect the start of the string. Then, the non capturing group of \b(?:aaa|bbb|ccc)\b, to detect the first instance of any word. This is then followed by any number of words, preceded by at least one space (?:\s+\b(?:aaa|bbb|ccc)\b)*, basically the same pattern, but with the \s+ in front, and wrapped in a *. And then we need the string to end somewhere. This is done with the dollar sign $.

For Regex 2:

The basic strategy is the same. You would just check with a negative lookahead, that the matched string does not exist again:

//var re = /^(?=.*?\baaa\b)(?!.*?\baaa\b.*?\baaa\b)(?=.*?\bbbb\b)(?!.*?\bbbb\b.*?\bbbb\b)(?=.*?\bccc\b)(?!.*?\bccc\b.*?\bccc\b)\b(?:aaa|bbb|ccc)\b(?:\s+\b(?:aaa|bbb|ccc)\b)*$/;
// optimized version, see comments
var re = /^(?=.*?\baaa\b)(?=.*?\bbbb\b)(?=.*?\bccc\b)(?!.*?\b(\w+)\b.*?\b\1\b)\b(?:aaa|bbb|ccc)\b(?: +\b(?:aaa|bbb|ccc)\b)*$/;
var res = document.getElementById('result');
res.innerText += re.test('aaa ccc bbb');
res.innerText += ', ' + re.test('aaa ccc ddd');
res.innerText += ', ' + re.test('aaa bbb aaa');
res.innerText += ', ' + re.test('aaa ccc bbb ccc');
<div id="result"></div>

First, we have the positive lookahead (?=.*?\bword\b) to see that word exists. We follow that by the negative lookahead (?!.*?\baaa\b.*?\baaa\b) to see, the word does not exist multiple times. Repeat for all words. Presto!

Update: Instead of checking the specific words aren't repeated, we can also check that NO word is repeated by using the (?!.*?\b(\w+)\b.*?\b\1\b) construct. This makes the regex more concise. Thanks to @revo for pointing it out.

Upvotes: 3

Sergej
Sergej

Reputation: 2196

Do not use regex for uniqueness.

But for separate words in regex, you can use \b

Example: /\b(word1|word2|word3)\b/

Upvotes: 1

shikai ng
shikai ng

Reputation: 135

why do you need regex to perform this function though? you could achieve what you want easily by first splitting the strings with delimiter ",". You can then create a dictionary object with the words that you are seeking as the keys and values defaulted to -1

Regex 2 can be achieved by looping through the input words and check if they exists as keys in the dictionary object. Regex 1 can be achieved similarly, just that when a key is matched to the input word, its value would then be changed to 1 and when it is next visited, a false match can be returned.

Upvotes: 2

Related Questions