Reputation: 33
I need to split by comma in the text but the text also has a comma inside brackets which need to be ignored
Input text : Selectroasted peanuts,Sugars (sugar, fancymolasses),Hydrogenatedvegetable oil (cottonseed and rapeseed oil),Salt.
Expected output:
MyCode
string pattern = @"\s*(?:""[^""]*""|\([^)]*\)|[^, ]+)";
string input = "Selectroasted peanuts,Sugars (sugar, fancymolasses),Hydrogenatedvegetable oil (cottonseed and rapeseed oil),Salt.";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine("{0}", m.Value);
}
The output I am getting:
Please help.
Upvotes: 2
Views: 309
Reputation: 627022
You can use
string pattern = @"(?:""[^""]*""|\([^()]*\)|[^,])+";
string input = "Selectroasted peanuts,Sugars (sugar, fancymolasses),Hydrogenatedvegetable oil (cottonseed and rapeseed oil),Salt.";
foreach (Match m in Regex.Matches(input.TrimEnd(new[] {'!', '?', '.', '…'}), pattern))
{
Console.WriteLine("{0}", m.Value);
}
// => Selectroasted peanuts
// Sugars (sugar, fancymolasses)
// Hydrogenatedvegetable oil (cottonseed and rapeseed oil)
// Salt
See the C# demo. See the regex demo, too. It matches one or more occurrences of
"[^"]*"
- "
, zero or more chars other than "
and then a "
|
- or\([^()]*\)
- a (
, then any zero or more chars other than (
and )
and then a )
char|
- or[^,]
- a char other than a ,
.Note the .TrimEnd(new[] {'!', '?', '.', '…'})
part in the code snippet is meant to remove the trailing sentence punctuation, but if you can affort Salt.
in the output, you can remove that part.
Upvotes: 2