user2802962
user2802962

Reputation: 63

regex to extract strings outside single or double quotes

I am currently building a web page using asp.net and C#. I am having trouble parsing a string that the user provided. For instance, a user provided the below string and I need to extract the words that are outside the single or double quotes. Can someone help me with this issue? Thanks in advance for your assistance.

"we run" live "experiments" inside and outside 'a lab'

The expected result using regex is:

live

inside

and

outside

Upvotes: 4

Views: 1774

Answers (2)

acarlon
acarlon

Reputation: 17274

This will do it. All matches with group 'unquote' match what you want:

(?<unquote>[^"'\s]+)|(?:["][^"]+?["])|(?:['][^']+?['])

The C# test code:

 var matches = Regex.Matches( @"""we run"" live ""experiments"" inside and outside 'a lab'", @"(?<unquote>[^""'\s]+)|(?:[""][^""]+?[""])|(?:['][^']+?['])" );
 foreach( Match match in matches )
 {
     if( match.Groups["unquote"].Success )
     {
         Console.WriteLine( match.Groups["unquote"].Value.Trim() );
     }
 }

Output:

live

inside

and

outside

where:

  • <unquote> means put in a group called unquote
  • ^"'\s means match everything that is not a double single quote or a space.
  • (?:["][^"]+?["]) means match everything inside quote to next quote. Note the +? so that it is not greedy and the ?: so that the group is not captured. Same for single quote.

This will work with empty strings "" and strings where single quotes are nested in double quotes. Do you want to ignore apostrophes? If yes, then you will need to extend the regex a bit to allow ' not preceded by a space:

(?<unquote>(?>[^"\s](?<!\s[']))+)|(?:["][^"]+?["])|(?:['][^']+?['])

Good luck with your live experiments.

Upvotes: 1

I4V
I4V

Reputation: 35353

var parts = Regex.Split(input, @"[""'].+?[""']")
            .SelectMany(x => x.Split())
            .Where(s => !String.IsNullOrWhiteSpace(s))
            .ToList();

or

var parts = Regex.Split(input, @"[""'].+?[""']")
            .SelectMany(x => x.Split(new char[]{' '}, StringSplitOptions.RemoveEmptyEntries))
            .ToList();

Upvotes: 1

Related Questions