Mary Grass
Mary Grass

Reputation: 94

Regex for obtaining numeric values within a string in C#

I have the following example strings:

TAR:100
TAR:100|LED:50
TAR:30|LED:30|ASO:40

I need a regex that obtains the numeric values after the colon, which are always in the range 0 to 100 inclusive.

The result after the regex is applied to any of the above strings should be:

for TAR:100 the result should be 100

for TAR:100|LED:50 the result should be the array [100,50]

for TAR:30|LED:30|ASO:40 the result should be the array [30,30,40]

The word before the colon can have any length and both upper and lowercase.

I have tried with the following but it doesn't yield the result I need:

 String text = "TAR:100|LED:50";
 String pattern = "\\|?([a-zA-Z]{1,}:)";
 string[] values= Regex.Split(text, pattern);

The regex should work whether the string is TAR:100 or TAR:100|LED:50 if possible.

Upvotes: 1

Views: 94

Answers (2)

Peter B
Peter B

Reputation: 24137

You added () which makes the text parts that you want to remove also be returned.

Below is my solution, with a slightly changed regex.

Note that we need to start looping the values at i = 1, which is purely caused by using Split on a string that starts with a split-sequence; it has nothing to do with the Regex itself.
Explanation: if we used a simpler str.Split to split by a separator "#", then "a#b#c" would produce ["a", "b", "c"], whereas "#b#c" would produce ["", "b", "c"]. In general, and by definition: if Split removes N sequences by which the string gets splitted, then the result is N+1 strings. And all the strings that we deal with here are of the form "#b#c", so there is always an empty first result.

Accepting that as a given fact, the results are usable by starting from i = 1:

var pattern = @"\|?[a-zA-Z]+:";
var testCases = new[] { "TAR:100", "TAR:100|LED:50", "TAR:30|LED:30|ASO:40" };
foreach (var text in testCases)
{
    string[] values = Regex.Split(text, pattern);
    for (var i = 1; i < values.Length; i++)
        Console.WriteLine(values[i]);
    Console.WriteLine("------------");
}

Output:

100
------------
100
50
------------
30
30
40
------------

Working DotNetFiddle: https://dotnetfiddle.net/i9kH8n

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163207

In .NET you can use the Group.Captures and use the same name for 2 capture groups and match the format of the string.

\b[a-zA-Z]+:(?<numbers>[0-9]+)(?:\|[a-zA-Z]+:(?<numbers>[0-9]+))*\b

Regex demo | C# demo

enter image description here

string[] strings = {
    "TAR:100",
    "TAR:100|LED:50",
    "TAR:30|LED:30|ASO:40"
    };
string pattern = @"\b[a-zA-Z]+:(?<numbers>[0-9]+)(?:\|[a-zA-Z]+:(?<numbers>[0-9]+))*\b";
foreach (String str in strings)
{
    Match match = Regex.Match(str, pattern);

    if (match.Success)
    {
        string[] result = match.Groups["numbers"].Captures.Select(c => c.Value).ToArray();
        Console.WriteLine(String.Join(',', result));
    }
}

Output

100
100,50
30,30,40

Another option could be making use of the \G anchor and have the value in capture group 1.

\b(?:[a-zA-Z]+:|\G(?!^))([0-9]+)(?:\||$)

Regex demo | C# demo

enter image description here

string[] strings = {
    "TAR:100",
    "TAR:100|LED:50",
    "TAR:30|LED:30|ASO:40"
    };
string pattern = @"\b(?:[a-zA-Z]+:|\G(?!^))([0-9]+)(?:\||$)";
foreach (String str in strings)
{
    MatchCollection matches = Regex.Matches(str, pattern);
    string[] result = matches.Select(m => m.Groups[1].Value).ToArray();

    Console.WriteLine(String.Join(',', result));
}

Output

100
100,50
30,30,40

Upvotes: 0

Related Questions