Reputation: 94
I have the following example strings:
TAR:100
TAR:100|LED:50
TAR:30|LED:30|ASO:40
I need a regex that obtains the numeric values after the colon, which are always in the range 0 to 100 inclusive.
The result after the regex is applied to any of the above strings should be:
for TAR:100
the result should be 100
for TAR:100|LED:50
the result should be the array [100,50]
for TAR:30|LED:30|ASO:40
the result should be the array [30,30,40]
The word before the colon can have any length and both upper and lowercase.
I have tried with the following but it doesn't yield the result I need:
String text = "TAR:100|LED:50";
String pattern = "\\|?([a-zA-Z]{1,}:)";
string[] values= Regex.Split(text, pattern);
The regex should work whether the string is TAR:100
or TAR:100|LED:50
if possible.
Upvotes: 1
Views: 94
Reputation: 24137
You added ()
which makes the text parts that you want to remove also be returned.
Below is my solution, with a slightly changed regex.
Note that we need to start looping the values at i = 1
, which is purely caused by using Split on a string that starts with a split-sequence; it has nothing to do with the Regex itself.
Explanation: if we used a simpler str.Split to split by a separator "#", then "a#b#c" would produce ["a", "b", "c"], whereas "#b#c" would produce ["", "b", "c"]. In general, and by definition: if Split removes N sequences by which the string gets splitted, then the result is N+1 strings. And all the strings that we deal with here are of the form "#b#c", so there is always an empty first result.
Accepting that as a given fact, the results are usable by starting from i = 1
:
var pattern = @"\|?[a-zA-Z]+:";
var testCases = new[] { "TAR:100", "TAR:100|LED:50", "TAR:30|LED:30|ASO:40" };
foreach (var text in testCases)
{
string[] values = Regex.Split(text, pattern);
for (var i = 1; i < values.Length; i++)
Console.WriteLine(values[i]);
Console.WriteLine("------------");
}
Output:
100
------------
100
50
------------
30
30
40
------------
Working DotNetFiddle: https://dotnetfiddle.net/i9kH8n
Upvotes: 1
Reputation: 163207
In .NET you can use the Group.Captures and use the same name for 2 capture groups and match the format of the string.
\b[a-zA-Z]+:(?<numbers>[0-9]+)(?:\|[a-zA-Z]+:(?<numbers>[0-9]+))*\b
string[] strings = {
"TAR:100",
"TAR:100|LED:50",
"TAR:30|LED:30|ASO:40"
};
string pattern = @"\b[a-zA-Z]+:(?<numbers>[0-9]+)(?:\|[a-zA-Z]+:(?<numbers>[0-9]+))*\b";
foreach (String str in strings)
{
Match match = Regex.Match(str, pattern);
if (match.Success)
{
string[] result = match.Groups["numbers"].Captures.Select(c => c.Value).ToArray();
Console.WriteLine(String.Join(',', result));
}
}
Output
100
100,50
30,30,40
Another option could be making use of the \G
anchor and have the value in capture group 1.
\b(?:[a-zA-Z]+:|\G(?!^))([0-9]+)(?:\||$)
string[] strings = {
"TAR:100",
"TAR:100|LED:50",
"TAR:30|LED:30|ASO:40"
};
string pattern = @"\b(?:[a-zA-Z]+:|\G(?!^))([0-9]+)(?:\||$)";
foreach (String str in strings)
{
MatchCollection matches = Regex.Matches(str, pattern);
string[] result = matches.Select(m => m.Groups[1].Value).ToArray();
Console.WriteLine(String.Join(',', result));
}
Output
100
100,50
30,30,40
Upvotes: 0