Asik
Asik

Reputation: 22133

C# Regular expression to match these strings

I have some strings of the following format:

--> ABCDEF_(0) "Abcde fgh"

--> GHIJ4 1

The first one should return 3 matches:

-->
ABCDEF_(0)
"Abcde fgh"

The second one should also return 3 matches:

-->
GHIJ4
1

So what I want to match is:

  1. The arrow (-->)
  2. Groups of non-whitespace, non-quote-surrounded characters
  3. Expressions enclosed in quotes including their whitespace

There could conceivably more groups of type (2) and (3) in a string, so a single string could have more than just 3 matches.

So far this is what I have:

  var regex = new Regex(
      @"-->" + // match the starting arrow
      @"|[^""\s]*\S+[^""\s]*" + // match elements not surrounded by quotes, trimmed of surrounding whitespace
      @"|""[^""]+"""); // match elements surrounded by quotes

But this doesn't work because it breaks the expressions in quotes, returning for the first string:

-->
ABCDEF_(0)
"Abcde
fgh"

What regular expression would work? If there is a more simple method than regular expressions I would also accept it.

Upvotes: 1

Views: 308

Answers (2)

Asik
Asik

Reputation: 22133

Thanks to an answer that got quickly deleted for some reason, I've managed to solve the problem.

Ideas:

  • The first group "-->" is redundant
  • Second and third group should be swapped.

Resulting regex:

Regex sWordMatch = new Regex(
      @"""[^""]*""" + // groups of characters enclosed in quotes
      @"|[^""\s]*\S+[^""\s]*", // groups of characters without whitespace not enclosed in quotes

Upvotes: 0

Martin Ernst
Martin Ernst

Reputation: 5679

It would be easier to use captures (I've used named captures here):

var regex = new Regex(@"-->" // match the arrow
    + @"\s+(?<first>[^\s]+)" // capture the first part always unquoted
    + @"(\s+(?<second>(""[^""]+"")|[^\s]+))+"); // capture the second part, possibly quoted

var match = regex.Match("--> ABCDEF_(0) \"Abcde fgh\"");
Console.WriteLine(match.Groups["first"].Value);
Console.WriteLine(match.Groups["second"].Value);

match = regex.Match("--> GHIJ4 1");
Console.WriteLine(match.Groups["first"].Value);
Console.WriteLine(match.Groups["second"].Value);

match = regex.Match("--> GHIJ4 1 \"Test Something\" \"Another String With Spaces\" \"And yet another one\"");
Console.WriteLine(match.Groups["first"].Value);
Console.WriteLine("Total matches:" + match.Groups["second"].Captures.Count);
Console.WriteLine(match.Groups["second"].Captures[0].Value);
Console.WriteLine(match.Groups["second"].Captures[1].Value);
Console.WriteLine(match.Groups["second"].Captures[2].Value);
Console.WriteLine(match.Groups["second"].Captures[3].Value);

Upvotes: 1

Related Questions