Edgar
Edgar

Reputation: 4488

Regex for capturing values in a delimited list

I'm trying to write a regex that will extract clean values from a delimited list. The catch is that the list could be delimited by different symbols or words. The captured values will be trimmed in the code, so spaces don't matter.

Input:

English (UK), French* , German and Polish  & Russian; Portugese and Italian

Regex I have so far:

\A(?:(?<Value>[^,;&*]+)[,;&\s*]*)*\Z

The delimiters I'm expecting are ,;&. I included the * because I want it excluded from the captured value.

Captured values:

English (UK), French, German and Polish, Russian, Portugese and Italian

Expected values:

English (UK), French, German, Polish, Russian, Portugese, Italian

The problem I have is that I can't get and to be treated as a delimiter.

Upvotes: 0

Views: 163

Answers (3)

Bernhard Barker
Bernhard Barker

Reputation: 55609

This is what I came up with:

\A(?:(?<Value>(?:[^,;&*\s]|\s(?!and))+)(?:(?:and|[,;&\s*])*))*\Z

Explanation:

(?:...) is a non-capturing group, not changing the match, just not storing the result in a group.

(?!...) is negative lookahead, matching if the characters following don't match the given pattern.

Basically this only matches white-space as part of Value if "and" doesn't follow it, and it includes "and" in the separator.

This seems awfully complicated, you may want replace " and " with a separator and use your current expression.

Test.

Upvotes: 1

user2704193
user2704193

Reputation:

Or just do this to your current result:

desiredResult = currentResult.Replace("and", ",");

Upvotes: 0

ojlovecd
ojlovecd

Reputation: 4892

I think it is not necessary to use Regex here:

    string str = "English (UK), French* , German and Polish  & Russian; Portugese and Italian";
    string[] results = str.Split(new string[] { ",", ";", "&", "*" }, StringSplitOptions.RemoveEmptyEntries);
    foreach (string s in results)
        if (!string.IsNullOrWhiteSpace(s))
            Console.WriteLine(s);

Upvotes: 1

Related Questions