Alex
Alex

Reputation: 44385

My regex should match 3 times, but matches only twice (PHP)

I have a small code snippet with a regex to match a numerical value, followed by an optional text in rectangular brackets, ended by the character 'A'. An example is

preg_match_all("/([0-9]+)(?:\[(.*)\])?A/", "123[SomeText]A345[SomeOtherText]A678A", $matches);
print_r($matches);

and the expected result is to have three matches:

Array
(
    [0] => Array
        (
            [0] => 123[SomeText]A
            [1] => 345[SomeOtherText]A
            [2] => 678A
        )

    [1] => Array
        (
            [0] => 123
            [1] => 345
            [2] => 678
        )

    [2] => Array
        (
            [0] => SomeText
            [1] => SomeOtherText
            [2] => 
        )
)

while the above code only matches twice:

Array
(
    [0] => Array
        (
            [0] => 123[SomeText]A345[SomeOtherText]A
            [1] => 678A
        )

    [1] => Array
        (
            [0] => 123
            [1] => 678
        )

    [2] => Array
        (
            [0] => SomeText]A345[SomeOtherText
            [1] => 
        )
)

I cannot see what I am doing wrong in the given regex. Any suggestions for fixing the regex are welcome.

Upvotes: 0

Views: 189

Answers (4)

user904550
user904550

Reputation:

Well, if you're pretty sure that your texts are not containing 'A' character, then this should work:

'/(\d*)([^A]*)/'

but if each parts ends with new part which has a number in the beginning this should work:

'/(\d*)([^\d]*)/'

with this line

preg_match_all('/(\d*)([^\d]*)/', "123[SomeText]A345[SomeOtherText]A678A", $matches, PREG_SET_ORDER);
print_r($matches);

Upvotes: 0

Martin Ender
Martin Ender

Reputation: 44279

Your problem is greediness. Quantifiers (like .*) consume as much as possible. And since . can also match ] it just goes past the first closing ] to the last one. You could make the repetition ungreedy (matching as little as possible), by using .*?. But there's a better (faster) option in your case. Make sure, that you cannot go past the closing ], by removing it from the allowed characters:

/([0-9]+)(?:\[([^\]]*)\])?A/

Working demo.

Note that you don't actually have to escape the closing bracket, because it is the first character in the class (and because empty classes are not allowed, you don't need to escape it) - and the other one is outside a class, so it can't be a closing bracket either:

/([0-9]+)(?:\[([^]]*)])?A/

Working demo.

Which variant you prefer is a matter of taste, I'd say.

Upvotes: 3

eisberg
eisberg

Reputation: 3771

The .* is greedy, so change it:

<?php
preg_match_all("/([0-9]+)(?:\[([^A]*)\])?A/", "123[SomeText]A345[SomeOtherText]A678A", $matches);
print_r($matches);

Try it out: http://3v4l.org/BZgmH

Upvotes: 2

John Woo
John Woo

Reputation: 263843

how about this pattern?

\d+(\[.*?\])?A

Upvotes: 0

Related Questions