haughtonomous
haughtonomous

Reputation: 4850

How to extract numbers from a string using regular expressions?

This little challenge just screams regular expressions to me, but so far I am stumped.

I have an arbitrary string that contains two numbers embedded in it. I need to extract those two numbers, which will be n and m digits long (n,m are unknown in advance). The format of the string is always

FixedWord[n digits]anotherfixedword[m digits]alotmorestuffontheend

The first number is of the format 1.2.3.4 (the number of digits varying) eg 5.3.20 or 5.3.10.1 or 5.4.

and the second is a simpler 'm' digits (eg 25 or 2)

eg "AppName5.2.6dbVer44Oracle.Group"

It shouts 'pattern matching' and hence "extraction using regexes". Can anyone guide me further?

TIA

Upvotes: 1

Views: 14382

Answers (5)

Olivier Jacot-Descombes
Olivier Jacot-Descombes

Reputation: 112762

Simply look for the numbers, since you only care for the numbers and don't want to check the syntax of the whole input string.

Matches matches = Regex.Matches(input, @"\d+(\.\d+)*");
if (matches.Count >= 2) {
    string number1 = matches[0].Value;
    string number2 = matches[1].Value;
} else {
    // Less than two numbers found
}

The expression \d+(\.\d+)* means:

\d+           one or more digits.
( )*         repeat zero, one or more times.
\.\d+       one decimal point (escaped with \) followed by one or more digits.

and

\d            one digit.
( )          grouping.
+              repeat the expression to the left one or more times.
*              repeat the expression to the left zero, one or more times.
\              escapes characters that have a special meaning in regex.
.              any character (without escaping).
\.            period character (".").

Upvotes: 0

ΩmegaMan
ΩmegaMan

Reputation: 31721

Keep it basic by specifing a match ( ) by looking for a digit \d, then zero or more * digits or periods in a set [\d.] (the set is \d -or- a literal period):

var data    = "AppName5.2.6dbVer44Oracle.Group";
var pattern = @"(\d[\d.]*)";

// Outputs:
// 5.2.6
// 44
Console.WriteLine (Regex.Matches(data, pattern)
                        .OfType<Match>()
                        .Select (mt => mt.Groups[1].Value));

Each match will be a number within the sentence. So if the total set of numbers change, the pattern will not fail and dutifully report 1 to N numbers.

Upvotes: 0

SukkoPera
SukkoPera

Reputation: 621

You could start from the following:

^[a-zA-Z]+((?:\d+\.)+\d)[a-zA-Z]+(\d+).*$

I assumed that the fixed words are just made of letters and that you want to match the entire string. If you prefer, you could substitute the parts not in parentheses with the actual fixed words or change the character sets as desired. I recommend using a tool like https://regex101.com to fine-tune the expression.

Upvotes: 0

Lucas Trzesniewski
Lucas Trzesniewski

Reputation: 51430

The following pattern:

(\d+(?>\.\d+)*)\w+?(\d+)

Will match this:

AppName5.2.6dbVer44Oracle.Group
       \__________/   <-- match
       \___/     \/   <-- captures

Demo

And will capture the two values you're interested in in capture groups.

Use it like this:

var match = Regex.Match(input, @"(\d+(?>\.\d+)*)\w+?(\d+)");
if (match.Success)
{
    var first = match.Groups[1].Value;
    var second = match.Groups[2].Value;
    // ...
}

Pattern explanation:

(           # Start of group 1
  \d+       # a series of digits
  (?>       # start of atomic group
    \.\d+   #   dot followed by digits
  )*        # .. 0 to n times
)
\w+?        # some word characters (as few as possible)
(\d+)       # a series of digits captured in group 2

Upvotes: 3

Alex Voskresenskiy
Alex Voskresenskiy

Reputation: 2233

Try this:

\w*?([\d|\.]+)\w*?([\d{1,4}]+).*

Upvotes: 0

Related Questions