Reputation: 44385
I have a small code snippet with a regex to match a numerical value, followed by an optional text in rectangular brackets, ended by the character 'A'. An example is
preg_match_all("/([0-9]+)(?:\[(.*)\])?A/", "123[SomeText]A345[SomeOtherText]A678A", $matches);
print_r($matches);
and the expected result is to have three matches:
Array
(
[0] => Array
(
[0] => 123[SomeText]A
[1] => 345[SomeOtherText]A
[2] => 678A
)
[1] => Array
(
[0] => 123
[1] => 345
[2] => 678
)
[2] => Array
(
[0] => SomeText
[1] => SomeOtherText
[2] =>
)
)
while the above code only matches twice:
Array
(
[0] => Array
(
[0] => 123[SomeText]A345[SomeOtherText]A
[1] => 678A
)
[1] => Array
(
[0] => 123
[1] => 678
)
[2] => Array
(
[0] => SomeText]A345[SomeOtherText
[1] =>
)
)
I cannot see what I am doing wrong in the given regex. Any suggestions for fixing the regex are welcome.
Upvotes: 0
Views: 189
Reputation:
Well, if you're pretty sure that your texts are not containing 'A' character, then this should work:
'/(\d*)([^A]*)/'
but if each parts ends with new part which has a number in the beginning this should work:
'/(\d*)([^\d]*)/'
with this line
preg_match_all('/(\d*)([^\d]*)/', "123[SomeText]A345[SomeOtherText]A678A", $matches, PREG_SET_ORDER);
print_r($matches);
Upvotes: 0
Reputation: 44279
Your problem is greediness. Quantifiers (like .*
) consume as much as possible. And since .
can also match ]
it just goes past the first closing ]
to the last one. You could make the repetition ungreedy (matching as little as possible), by using .*?
. But there's a better (faster) option in your case. Make sure, that you cannot go past the closing ]
, by removing it from the allowed characters:
/([0-9]+)(?:\[([^\]]*)\])?A/
Note that you don't actually have to escape the closing bracket, because it is the first character in the class (and because empty classes are not allowed, you don't need to escape it) - and the other one is outside a class, so it can't be a closing bracket either:
/([0-9]+)(?:\[([^]]*)])?A/
Which variant you prefer is a matter of taste, I'd say.
Upvotes: 3
Reputation: 3771
The .* is greedy, so change it:
<?php
preg_match_all("/([0-9]+)(?:\[([^A]*)\])?A/", "123[SomeText]A345[SomeOtherText]A678A", $matches);
print_r($matches);
Try it out: http://3v4l.org/BZgmH
Upvotes: 2