salty00
salty00

Reputation: 43

Making a regex pattern created for C# work in PHP

I found the following answer on stackoverflow, in which it is described how to use a regex pattern to extract (nested) functions and arguments from a string.

The user also provides an example for which the pattern should work, and after he extracts the arguments from the most-outer function (which works fine in php too), he matches the following text

a,b,func1(a,b+c),func2(a*b,func3(a+b,c)),func4(e)+func5(f),func6(func7(g,h)+func8(i,(a)=>a+2)),g+2

with the following pattern

(?:[^,()]+((?:\((?>[^()]+|\((?<open>)|\)(?<-open>))*(?(open)(?!))\)))*)+

The following (simpler) pattern is also supposed to work

(?:[^,()]+((?:\((?>[^()]+|\((?<open>)|\)(?<-open>))*\)))*)+

Note that the user was using C# for his example, and I want to get it to work in PHP, however I can't manage to do it.

So when I try to match the example String using preg_match($pattern, $string, $matches), it doesn't return any matches.

If I enter the pattern and the example on regex101, it's giving me errors and I can't fix them (I guess it has something to do with the <open> part of the pattern).

I'm not the best in regex, so can someone please help me out and tell me what I need to change in the pattern for it to work with php?

Upvotes: 2

Views: 216

Answers (1)

Lucas Trzesniewski
Lucas Trzesniewski

Reputation: 51400

PHP uses the PCRE regex engine to implement its preg_* functions. PCRE can match nested patterns using recursion, whereas .NET uses a different approach: balancing groups. And there are other major differences between the two flavors.

So what you want to achieve is definitely possible, but requires to rewrite the regex entirely, since both approaches are totally different.

Here's such a translation for PCRE (with gx):

(?:
  [^,()]++         # match a function name
  (?<args>         # open the "args" group
    \(             # opening parenthesis
    (?:
      [^()]++      # match anything but parentheses
      | \g<args>   # or recurse into "args", make sure the parentheses are balanced
    )*+
    \)             # closing parenthesis
  )*+
)+

Demo

Or, in shorter form:

(?:[^,()]++(?<args>\((?:[^()]++|\g<args>)*+\))*+)+

Upvotes: 3

Related Questions