Jiniv Thakkar
Jiniv Thakkar

Reputation: 33

Regex to split and ignore brackets

I need to split by comma in the text but the text also has a comma inside brackets which need to be ignored

Input text : Selectroasted peanuts,Sugars (sugar, fancymolasses),Hydrogenatedvegetable oil (cottonseed and rapeseed oil),Salt.

Expected output:

MyCode

string pattern = @"\s*(?:""[^""]*""|\([^)]*\)|[^, ]+)";
string input = "Selectroasted peanuts,Sugars (sugar, fancymolasses),Hydrogenatedvegetable oil (cottonseed and rapeseed oil),Salt."; 
foreach (Match m in Regex.Matches(input, pattern)) 
{ 
Console.WriteLine("{0}", m.Value); 
}

The output I am getting:

Please help.

Upvotes: 2

Views: 309

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627022

You can use

string pattern = @"(?:""[^""]*""|\([^()]*\)|[^,])+";
string input = "Selectroasted peanuts,Sugars (sugar, fancymolasses),Hydrogenatedvegetable oil (cottonseed and rapeseed oil),Salt."; 
foreach (Match m in Regex.Matches(input.TrimEnd(new[] {'!', '?', '.', '…'}), pattern)) 
{ 
    Console.WriteLine("{0}", m.Value); 
}
// => Selectroasted peanuts
//    Sugars (sugar, fancymolasses)
//    Hydrogenatedvegetable oil (cottonseed and rapeseed oil)
//    Salt

See the C# demo. See the regex demo, too. It matches one or more occurrences of

  • "[^"]*" - ", zero or more chars other than " and then a "
  • | - or
  • \([^()]*\) - a (, then any zero or more chars other than ( and ) and then a ) char
  • | - or
  • [^,] - a char other than a ,.

Note the .TrimEnd(new[] {'!', '?', '.', '…'}) part in the code snippet is meant to remove the trailing sentence punctuation, but if you can affort Salt. in the output, you can remove that part.

Upvotes: 2

Related Questions