Reputation: 13
I need to separate strings with semicolon (;
) as delimiter. The semicolon inside parenthesis should be ignored.
Example:
string inputString = "(Apple;Mango);(Tiger;Horse);Ant;Frog;";
The output list of strings should be:
(Apple;Mango)
(Tiger;Horse)
Ant
Frog
The other valid input strings can be :
string inputString = "The fruits are (mango;apple), and they are good"
The above string should split to a single string
"The fruits are (mango;apple), and they are good"
string inputString = "The animals in (African (Lion;Elephant) and Asian(Panda; Tiger)) are endangered species; Some plants are endangered too."
The above string should split to two strings as shown below:
"The animals in (African (Lion;Elephant) and Asian(Panda; Tiger)) are endangered species"
"Some plants are endangered too."
I searched a lot but could not find the answer to the above scenario.
Does anybody know how to achieve this without reinventing the wheel?
Upvotes: 0
Views: 1323
Reputation: 700582
Use a regular expression that matches what you want to keep, not the separators:
string inputString = "(Apple;Mango);(Tiger;Horse);Ant;Frog;";
MatchCollection m = Regex.Matches(inputString, @"\([^;)]*(;[^;)]*)*\)|[^;]+");
foreach (Match x in m){
Console.WriteLine(x.Value);
}
Output:
(Apple;Mango)
(Tiger;Horse)
Ant
Frog
Expression comments:
\( opening parenthesis
[^;)]* characters before semicolon
(;[^;)]*)* optional semicolon and characters after it
\) closing parenthesis
| or
[^;]+ text with no semicolon
Note: The expression above also accepts values in parentheses without a semicolon, e.g. (Lark)
and mulitple semicolons, e.g. (Lark;Pine;Birch)
. It will also skips empty values, e.g. ";;Pine;;;;Birch;;;"
will be two items, not ten.
Upvotes: 1
Reputation: 116286
Handle the paranthesized case separately from the "normal" case, to ensure that semicolons are omitted in the former.
A regex to achieve this (matching a single element in your input) may look like the following (not tested):
"\([A-Za-z;]+\)|[A-Za-z]+"
Upvotes: 0