skyline26
skyline26

Reputation: 2034

how can i write this regex? ungreedy related

I'm sorry for the poor title, but it is a very generic question

I have to match this pattern

;AAAAAAA(BBBBBB,CCCCC,DDDDDD)

The "all characters between x and y" is a problem that kills me everytime

:(

I'm using PHP and I have to match all occurrences of this pattern (preg_match_all) that also, sadly, can be on multiple lines

Thank you in advance!

Upvotes: 3

Views: 95

Answers (2)

Martin Ender
Martin Ender

Reputation: 44279

I would recommend you do not use an ungreedy quantifier, but instead make all repetitions mutually exclusive with their delimiters. What does this mean? It means, for instance, that A can be any character except (. Giving this regex:

;([^(]*)[(]([^,]*),([^,]*),([^)]*)[)]

Where the last [)] is not even necessary.

The PHP code would then look like this:

preg_match_all('/;([^(]*)[(]([^,]*),([^,]*),([^)]*)[)]/', $input, $matches);
$fullMatches = $matches[0];
$arrayOfAs = $matches[1];
$arrayOfBs = $matches[2];
$arrayOfCs = $matches[3];
$arrayOfDs = $matches[4];

As the comments show, my escaping technique is a matter of taste. This regex is of course equal to:

;([^(]*)\(([^,]*),([^,]*),([^)]*)\)

But I think that looks a lot more mismatched/unbalanced than the other variant. Take you pick!

Finally, for the question why this approach would be better than using ungreedy (lazy) quantifiers. Here is some good, general reading. Basically, when you use ungreedy quantifiers, the engine still has to backtrack. It tries one repetition first, then notices that ( after that doesn't match. So it has to go back into the repetition and consume another character. But then the ( still doesn't match, so back to the repetition again. With this approach however, the engine will consume as much as possible, when going into the repetition for the first time. And when all non-( characters are consumed, then the engine will be able to match the following ( right away.

Upvotes: 3

bozdoz
bozdoz

Reputation: 12870

You could use something like this code:

preg_match_all('/;(.*?)\((.*?),(.*?),(.*?)\)/s',$text,$matches);

See it on ideone.com.

Basically, you can use .*? (question mark being ungreedy), make sure to escape the parentheses, and you may need the s modifier to have it work on multiple lines.

Variables would be in an array: $matches

Upvotes: 1

Related Questions