conbask
conbask

Reputation: 10061

Capture exactly 2 words from string

I'm using regex to parse a command that looks like this:

!hello foo bar

I would like to capture foo and bar. If the command is passed anything but 2 arguments then I want the regex to fail.

Here's my regex so far:

^!hello (.*)$

I know that {2} can be used to limit the amount captured, but I'm not sure exactly how to use it in this situation.

Thanks

Upvotes: 1

Views: 64

Answers (2)

GVH
GVH

Reputation: 414

.* captures everything, including whitespace. What you want to do is capture a run of one or more characters that can be anything but whitespace, then some whitespace, then another run of non-whitespace characters.

The way to capture this using regex syntax is:

^!hello\s+(\S+)\s+(\S+)\s*$

Note the use of + instead of * - you must have at least one space between the words, 0 spaces is not acceptable. Each word must also be at least one character. This also allows a run of trailing whitespace.

Note that \S will recognize anything that is not whitespace. This means that

hello %__ second_word

would match. If you want to only match word characters for the words, use \w instead of \S (see the [HP manual for definitions of the different generic character types, or instructions for creating your own character class.

Upvotes: 1

Michael Berkowski
Michael Berkowski

Reputation: 270609

Instead of the greedy (.*), I would recommend using something more specific like \w+ to match one or more "word" characters. Since whitespace may be insignificant, separate them with \s+. Rather than trying to use {2}, since you expect exactly two separated by whitespace, it is easier to spell each group out literally as \w+ with the whitespace requirement in between.

^!hello\s+(\w+)\s+(\w+)$

If you don't actually need to reuse the arguments, remove the ().

^!hello\s+\w+\s+\w+$

$pattern = '/^!hello\s+\w+\s+\w+$/';
echo preg_match($pattern, '!hello foo bar');
// 1
echo preg_match($pattern, '!hello foo bar baz');
// 0
echo preg_match($pattern, '!hello "foo bar" baz');
// 0
// Note a numeric argument matches \w+... If that isn't allowed
// you should use [A-Za-z]+ instead or just [a-z]+ and add the /i flag
echo preg_match($pattern, '!hello 123 baz');
// 1
echo preg_match($pattern, '!hello a$1 baz');
// 0

Upvotes: 5

Related Questions