Reputation: 31
I have tried a regular expression to split a string with comma and space. Expression matches all the cases except only one. The code I have tried is:
List<string> strNewSplit = new List<string>();
Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
foreach (Match match in csvSplit.Matches(input))
{
strNewSplit.Add(match.Value.TrimStart(','));
}
return strNewSplit;
CASE1: "MYSQL,ORACLE","C#,ASP.NET"
EXpectedOutput:
"MYSQL,ORACLE"
"C#,ASP.NET"
RESULT : PASS
CASE2: "MYSQL,ORACLE", "C#,ASP.NET"
ExpectedOutput:
"MYSQL,ORACLE"
"C#,ASP.NET"
Actual OutPut:
"MYSQL,ORACLE"
"C#
ASP.NET"
RESULT: FAIL.
If I provide a space after a comma in between two DoubleQuotes then I didn't get appropriate output. Am I missing anything? Please provide a better solution.
Upvotes: 3
Views: 2324
Reputation: 656
I normally write down the EBNF of my Input to parse.
In your case I would say:
List = ListItem {Space* , Space* ListItem}*;
ListItem = """ Identifier """; // Identifier is everything whitout "
Space = [\t ]+;
Which means a List consists of a ListItem that is folled by zero or mutliple (*) ListItems that are separated with spaces a comma and again spaces.
That lead me to the following (you are searching for ListItems):
static void Main(string[] args)
{
matchRegex("\"MYSQL,ORACLE\",\"C#,ASP.NET\"").ForEach(Console.WriteLine);
matchRegex("\"MYSQL,ORACLE\", \"C#,ASP.NET\"").ForEach(Console.WriteLine);
}
static List<string> matchRegex(string input)
{
List<string> strNewSplit = new List<string>();
Regex csvSplit = new Regex(
"(\"(?:[^\"]*)\")"
, RegexOptions.Compiled);
foreach (Match match in csvSplit.Matches(input))
{
strNewSplit.Add(match.Value.TrimStart(','))
}
return strNewSplit;
}
Which returns what you wanted. Hope I understood you correctly.
Upvotes: 1