Reputation: 4850
This little challenge just screams regular expressions to me, but so far I am stumped.
I have an arbitrary string that contains two numbers embedded in it. I need to extract those two numbers, which will be n and m digits long (n,m are unknown in advance). The format of the string is always
FixedWord[n digits]anotherfixedword[m digits]alotmorestuffontheend
The first number is of the format 1.2.3.4
(the number of digits varying) eg 5.3.20
or 5.3.10.1
or 5.4
.
and the second is a simpler 'm' digits (eg 25
or 2
)
eg "AppName5.2.6dbVer44Oracle.Group"
It shouts 'pattern matching' and hence "extraction using regexes". Can anyone guide me further?
TIA
Upvotes: 1
Views: 14382
Reputation: 112762
Simply look for the numbers, since you only care for the numbers and don't want to check the syntax of the whole input string.
Matches matches = Regex.Matches(input, @"\d+(\.\d+)*");
if (matches.Count >= 2) {
string number1 = matches[0].Value;
string number2 = matches[1].Value;
} else {
// Less than two numbers found
}
The expression \d+(\.\d+)*
means:
\d+
one or more digits.
( )*
repeat zero, one or more times.
\.\d+
one decimal point (escaped with \) followed by one or more digits.
and
\d
one digit.
( )
grouping.
+
repeat the expression to the left one or more times.
*
repeat the expression to the left zero, one or more times.
\
escapes characters that have a special meaning in regex.
.
any character (without escaping).
\.
period character (".").
Upvotes: 0
Reputation: 31721
Keep it basic by specifing a match ( )
by looking for a digit \d
, then zero or more *
digits or periods in a set [\d.]
(the set is \d -or- a literal period):
var data = "AppName5.2.6dbVer44Oracle.Group";
var pattern = @"(\d[\d.]*)";
// Outputs:
// 5.2.6
// 44
Console.WriteLine (Regex.Matches(data, pattern)
.OfType<Match>()
.Select (mt => mt.Groups[1].Value));
Each match will be a number within the sentence. So if the total set of numbers change, the pattern will not fail and dutifully report 1 to N numbers.
Upvotes: 0
Reputation: 621
You could start from the following:
^[a-zA-Z]+((?:\d+\.)+\d)[a-zA-Z]+(\d+).*$
I assumed that the fixed words are just made of letters and that you want to match the entire string. If you prefer, you could substitute the parts not in parentheses with the actual fixed words or change the character sets as desired. I recommend using a tool like https://regex101.com to fine-tune the expression.
Upvotes: 0
Reputation: 51430
The following pattern:
(\d+(?>\.\d+)*)\w+?(\d+)
Will match this:
AppName5.2.6dbVer44Oracle.Group
\__________/ <-- match
\___/ \/ <-- captures
And will capture the two values you're interested in in capture groups.
Use it like this:
var match = Regex.Match(input, @"(\d+(?>\.\d+)*)\w+?(\d+)");
if (match.Success)
{
var first = match.Groups[1].Value;
var second = match.Groups[2].Value;
// ...
}
Pattern explanation:
( # Start of group 1
\d+ # a series of digits
(?> # start of atomic group
\.\d+ # dot followed by digits
)* # .. 0 to n times
)
\w+? # some word characters (as few as possible)
(\d+) # a series of digits captured in group 2
Upvotes: 3