qqqqqqq
qqqqqqq

Reputation: 2227

Match groups of words with any other words between using regex

I have a few group of words. E.g.:

["foo", "bar", "baz"]
["hello", "world"]

Now I want a regex which would check that I have these group of words in the order in which they go in my list of groups and also I would like to allow any other words in any quantity before, between of after the groups. E.g. I want the following strings to be matched by my regex:

"foo bar baz hey how do you do hello world"
"foo bar baz hello world"
"something cool foo bar baz maybe hello world"
"something cool foo bar baz maybe hello world amazing"

I am building the regex string dynamically. In my case here is what I am building out:

"( \S+ )*foo bar baz( \S+ )*hello world( \S+ )*"

I expect the \S+ to match anything except a space.

I expect the ( \S+ ) to match anything except a space surrounded by a space from both sides.

And I expect the ( \S+ )* to match any number of words surrounded by spaces.

But it does not work as I expect. E.g. in the string

"something cool foo bar baz maybe hello world amazing"

Only this part gets matched by regex, but not a whole string

" cool foo bar baz maybe hello world "

What am I missing in my regex?

I am asking this question while learning the regular expressions and practicing to tackle some real world tasks with the regex.

Upvotes: 0

Views: 981

Answers (1)

The fourth bird
The fourth bird

Reputation: 163362

Using ( \S+ )* matches the space before and after, so repeating it would match 2 consecutive spaces for example.

What you could do, is omit the space at the start before the first list of words, and between and after the lists of words omit the space at the end and prepend a space between the first and the second list.

(\S+ )*foo bar baz( \S+)* hello world( \S+)*
  • (\S+ )* Optionally repeat 1+ non whitespace chars followed by a space
  • foo bar baz Match the first list of words
  • ( \S+)* In between lists, optionally repeat a space and 1+ non whitespace chars and add a space at the end
  • hello world Match the second list of words
  • ( \S+)* Optionally match trailing words

Regex demo

Upvotes: 3

Related Questions