user1571734
user1571734

Reputation: 13

Separate strings with semicolon and delimiter and containing parentheses

I need to separate strings with semicolon (;) as delimiter. The semicolon inside parenthesis should be ignored.

Example:

string inputString = "(Apple;Mango);(Tiger;Horse);Ant;Frog;";

The output list of strings should be:

(Apple;Mango)
(Tiger;Horse)
Ant
Frog

The other valid input strings can be :

string inputString = "The fruits are (mango;apple), and they are good"

The above string should split to a single string

"The fruits are (mango;apple), and they are good"

string inputString = "The animals in (African (Lion;Elephant) and Asian(Panda; Tiger)) are endangered species; Some plants are endangered too."

The above string should split to two strings as shown below:

"The animals in (African (Lion;Elephant) and Asian(Panda; Tiger)) are endangered species"
"Some plants are endangered too."

I searched a lot but could not find the answer to the above scenario.

Does anybody know how to achieve this without reinventing the wheel?

Upvotes: 0

Views: 1323

Answers (2)

Guffa
Guffa

Reputation: 700582

Use a regular expression that matches what you want to keep, not the separators:

string inputString = "(Apple;Mango);(Tiger;Horse);Ant;Frog;";

MatchCollection m = Regex.Matches(inputString, @"\([^;)]*(;[^;)]*)*\)|[^;]+");

foreach (Match x in m){
  Console.WriteLine(x.Value);
}

Output:

(Apple;Mango)
(Tiger;Horse)
Ant
Frog

Expression comments:

\(           opening parenthesis
[^;)]*       characters before semicolon
(;[^;)]*)*   optional semicolon and characters after it
\)           closing parenthesis
|            or
[^;]+        text with no semicolon

Note: The expression above also accepts values in parentheses without a semicolon, e.g. (Lark) and mulitple semicolons, e.g. (Lark;Pine;Birch). It will also skips empty values, e.g. ";;Pine;;;;Birch;;;" will be two items, not ten.

Upvotes: 1

Péter Török
Péter Török

Reputation: 116286

Handle the paranthesized case separately from the "normal" case, to ensure that semicolons are omitted in the former.

A regex to achieve this (matching a single element in your input) may look like the following (not tested):

"\([A-Za-z;]+\)|[A-Za-z]+"

Upvotes: 0

Related Questions