Footch
Footch

Reputation: 129

Parsing a list of functions and their parameters from a string

I have a string which contains some functions (I know their names) and their parameters like this: translate(700 210) rotate(-30)

I would like to parse each one of them in a string array starting with the function name followed by the parameters.

I don't know much abour regex and so far I got this:

MatchCollection matches = Regex.Matches(attribute.InnerText, @"((translate|rotate|scale|matrix)\s*\(\s*(-?\d+\s*\,*\s*)+\))*");
for (int i = 0; i < matches.Count; i++)
{
    Console.WriteLine(matches[i].Value);
}

That this returns is:

translate(700 210)
[blank space]
rotate(-30)
[blank space]

This works for me because I can run another regular expression one each row from the resulting collection and get the contents. What I don't understand is why there are blank rows returned between the methods.

Also, is running a regex twice - once to separate the methods and once to actually parse them a good approach?

Thanks!

Upvotes: 1

Views: 475

Answers (2)

jdweng
jdweng

Reputation: 34429

I like using Regex with a dictionary

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication56
{
    class Program
    {
        static void Main(string[] args)
        {

            Dictionary<string, string> dict = new Dictionary<string, string>();

            string input = "translate(700 210) rotate(-30)";
            string pattern = @"(?'command'[^\(]+)\((?'value'[^\)]+)\)";

            MatchCollection matches = Regex.Matches(input, pattern);

            foreach(Match match in matches.Cast<Match>())
            {
                dict.Add(match.Groups["command"].Value, match.Groups["value"].Value);
            }

        }
    }

}

Upvotes: 0

Regex.Matches will match your entire regular expression multiple times. It finds one match for the whole thing, then finds the next match for the whole thing.

The outermost parens with * indicate that you're willing to accept zero or more of the preceding group's contents as a match. So when it finds none of them, it happily returns that. That is not your intent. You want exactly one.

The blanks are harmless, but "zero or more" also includes two. Consider this string, with no space between the two functions:

var text = "translate(700 210)rotate(-30)";

That's one match, according to the regex you provided. You'll get "rotate" and "-30". If the missing space is an error, detect it and warn the user. If you're not going to do that, parse it correctly.

So let's get rid of the outermost parens and that *. We'll also name the capturing groups, for readability.

var matches = Regex.Matches(text, @"(?<funcName>translate|rotate|scale|matrix)\s*\(\s*(?<param>-?\s*\d+\s*\,*\s*)+\)");

foreach (Match match in matches)
{
    if (match.Groups["funcName"].Success)
    {
        var funcName = match.Groups["funcName"].Value;
        var param = Int32.Parse(match.Groups["param"].Value);

        Console.WriteLine($"{funcName}( {param} )");
    }
}

I also stuck in \s* after the optional -, just in case.

Upvotes: 2

Related Questions