n1nsa1d00
n1nsa1d00

Reputation: 876

Best c# Regex pattern to get a function's parameters values in a raw string?

I'm parsing html code in a C# project.

Assuming that we have this string:

<a href="javascript:func('data1','data2'...)">...</a>

Or that after the necessary .subtring()'s this one:

func('data1','data2'...)

What would be the best Regex pattern to retrieve func()'s parameters avoiding counting on delimiter characters (' and ,) as they could sometimes be part of the parameter's string?

Upvotes: 2

Views: 2069

Answers (2)

jdweng
jdweng

Reputation: 34429

Try this

            string input = "<a href=\"javascript:func('data1','data2'...)\">...</a>";

            string pattern1 = @"\w+\((?'parameters'[^\)]+)\)";

            Regex expr1 = new Regex(pattern1);
            Match match1 = expr1.Match(input);
            string parameters = match1.Groups["parameters"].Value;

            string pattern2 = @"\w+";
            Regex expr2 = new Regex(pattern2);
            MatchCollection matches = expr2.Matches(parameters);

            List<string> results = new List<string>();
            foreach (Match match in matches)
            {
                results.Add(match.Value);
            }​

Upvotes: -2

Mariano
Mariano

Reputation: 6511

You should not use regex to parse programming language code, because it's not a regular language. This article explains why: Can regular expressions be used to match nested patterns?


And to prove my point, allow me to share an actual solution with a regex that I think will match what you want:

^                               # Start of string
[^()'""]+\(                     # matches `func(`
                                #
(?>                             # START - Iterator (match each parameter)
 (?(param)\s*,(?>\s*))          # if it's not the 1st parameter, start with a `,`
 (?'param'                      # opens 'param' (main group, captures each parameter)
                                #
   (?>                          # Group: matches every char in parameter
      (?'qt'['""])              #  ALTERNATIVE 1: strings (matches ""foo"",'ba\'r','g)o\'o')
      (?:                       #   match anything inside quotes
        [^\\'""]+               #    any char except quotes or escapes
        |(?!\k'qt')['""]        #    or the quotes not used here (ie ""double'quotes"")
        |\\.                    #    or any escaped char
      )*                        #   repeat: *
      \k'qt'                    #   close quotes
   |  (?'parens'\()             #  ALTERNATIVE 2: `(` open nested parens (nested func)
   |  (?'-parens'\))            #  ALTERNATIVE 3: `)` close nested parens
   |  (?'braces'\{)             #  ALTERNATIVE 4: `{` open braces
   |  (?'-braces'})             #  ALTERNATIVE 5: `}` close braces
   |  [^,(){}\\'""]             #  ALTERNATIVE 6: anything else (var, funcName, operator, etc)
   |  (?(parens),)              #  ALTERNATIVE 7: `,` a comma if inside parens
   |  (?(braces),)              #  ALTERNATIVE 8: `,` a comma if inside braces
   )*                           # Repeat: *
                                # CONDITIONS:
  (?(parens)(?!))               #  a. balanced parens
  (?(braces)(?!))               #  b. balanced braces
  (?<!\s)                       #  c. no trailing spaces
                                #
 )                              # closes 'param'
)*                              # Repeat the whole thing once for every parameter
                                #
\s*\)\s*(?:;\s*)?               # matches `)` at the end if func(), maybe with a `;`
$                               # END

One-liner:

^[^()'""]+\((?>(?(param)\s*,(?>\s*))(?'param'(?>(?'qt'['""])(?:[^\\'""]+|(?!\k'qt')['""]|\\.)*\k'qt'|(?'parens'\()|(?'-parens'\))|(?'braces'\{)|(?'-braces'})|[^,(){}\\'""]|(?(parens),)|(?(braces),))*(?(parens)(?!))(?(braces)(?!))(?<!\s)))*\s*\)\s*(?:;\s*)?$

Test online

As you can imagine by now (if you're still reading), even with an indented pattern and with comments for every construct, this regex is unreadable, quite difficult to mantain and almost impossible to debug... And I can guess there will be exceptions that would make it fail.

Just in case a stubborn mind is still interested, here's a link to the logic behind it: Matching Nested Constructs with Balancing Groups (regular-expressions.info)

Upvotes: 5

Related Questions