Spoike
Spoike

Reputation: 121752

Regex to match string between %

I'm trying to match substrings that are enclosed in %'s but preg_match_all seems to include several at the same time in the same line.

Code looks like this:

preg_match_all("/%.*%/", "%hey%_thereyou're_a%rockstar%\nyo%there%", $matches);
print_r($matches);

Which produces the following output.

Array
(
    [0] => Array
        (
            [0] => %hey%_thereyou're_a%rockstar%
            [1] => %there%
        )

)

However I'd like it to produce the following array instead:

[0] => %hey%
[1] => %rockstar%
[2] => %there%

What am I missing?

Upvotes: 3

Views: 5032

Answers (7)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

While the solution is to turn a greedy .* into a lazy .*? (or replace .* with [^%]*), you might also want to actually get rid of % symbols in the output.

In that case, you will need to use a capturing group and get $matches[1] if a match occurred:

$str = "%hey%_thereyou're_a%rockstar%\nyo%there%";
if (preg_match_all("/%([^%]*)%/", $str, $matches)) {
    print_r($matches[1]);
}
// => Array( [0] => hey [1] => rockstar [2] => there )

Note that print_r($matches[0]); will output full matches, // => Array( [0] => %hey% [1] => %rockstar% [2] => %there% ). The [^%] pattern is a negated character class that matches any char other than a % char.

See the PHP demo.

Variations

If you need to make sure there are only letters, digits or underscores between % chars, you can use

"/%(\w*)%/"

If you want to match any chars other than % and whitespace between two % chars use

"/%([^\s%]*)%/"

The [^\s%]* pattern is a regex that matches any zero or more chars other than whitespace (\s) and a % char.

Upvotes: 0

Pradeep Kumar
Pradeep Kumar

Reputation:

|%(\w+)%| This will work exactly what do you want.

Upvotes: 1

Alix Axel
Alix Axel

Reputation: 154513

Add a ? after the *:

preg_match_all("/%.*?%/", "%hey%_thereyou're_a%rockstar%\nyo%there%", $matches);

Upvotes: 2

Tom Haigh
Tom Haigh

Reputation: 57815

You could try /%[^%]+%/ - this means in between the percent signs you only want to match characters which are not percent signs.

You could also maybe make the pattern ungreedy, e.g. /%.+%/U, so it will capture as little as possible (I think).

Upvotes: 1

fresskoma
fresskoma

Reputation: 25781

The reason is that the star is greedy. That is, the star causes the regex engine to repeat the preceding token as often as possible. You should try .*? instead.

Upvotes: 1

Greg
Greg

Reputation: 321588

You're doing a greedy match - use ? to make it ungreedy:

/%.*?%/

If a newline can occur inside the match, add the s (DOTALL) modifier:

/%.*?%/s

Upvotes: 4

Adam Batkin
Adam Batkin

Reputation: 52984

Replace the "." in your regular expression with "[^%]":

preg_match_all("/%[^%]*%/", "%hey%_thereyou're_a%rockstar%\nyo%there%", $matches);

What is happening is that the "." is "greedily" matching as much as it possibly can, including everything up-to the final % on the line. Replacing it with the negated character class "[^%]" means that it will instead match anything except a percent, which will make it match just the bits that you want.

Another option would be to place a "?" after the dot, which tells it "don't be greedy":

preg_match_all("/%.*?%/", "%hey%_thereyou're_a%rockstar%\nyo%there%", $matches);

In the above example, either option will work, however there are times when you may be searching for something larger than a single character, so a negated character class will not help, so the solution is to un-greedify the match.

Upvotes: 12

Related Questions