handles
handles

Reputation: 7853

Match only the nth occurrence using a regular expression

I have a string with 3 dates in it like this:

XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx

I want to select the 2nd date in the string, the 20180208 one.

Is there away to do this purely in the regex, with have to resort to pulling out the 2 match in code. I'm using C# if that matters.

Thanks for any help.

Upvotes: 2

Views: 3209

Answers (4)

Cary Swoveland
Cary Swoveland

Reputation: 110665

You could use the regular expression

^(?:.*?\d{8}_){1}.*?(\d{8})

to save the 2nd date to capture group 1.

Demo

Naturally, for n > 2, replace {1} with {n-1} to obtain the nth date. To obtain the 1st date use

^(?:.*?\d{8}_){0}.*?(\d{8})

Demo

The C#'s regex engine performs the following operations.

^        # match the beginning of a line
(?:      # begin a non-capture group
  .*?    # match 0+ chars lazily
  \d{8}  # match 8 digits
  _      # match '_'
)        # end non-capture group
{n}      # execute non-capture group n (n >= 0) times
.*?      # match 0+ chars lazily     
(\d{8})  # match 8 digits in capture group 1

The important thing to note is that the first instance of .*?, followed by \d{8}, because it is lazy, will gobble up as many characters as it can until the next 8 characters are digits (and are not preceded or followed by a digit. For example, in the string

_1234abcd_efghi_123456789_12345678_ABC

capture group 1 in (.*?)_\d{8}_ will contain "_1234abcd_efghi_123456789".

Upvotes: 0

Jan
Jan

Reputation: 43169

You could use

^(?:[^_]+_){2}(\d+)

And take the first group, see a demo on regex101.com.


Broken down, this says

^              # start of the string
(?:[^_]+_){2}  # not _ + _, twice
(\d+)          # capture digits

C# demo:

var pattern = @"^(?:[^_]+_){2}(\d+)"; 
var text = "XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx";
var result = Regex.Match(text, pattern)?.Groups[1].Value;
Console.WriteLine(result); // => 20180208

Upvotes: 3

Mohammad Ali
Mohammad Ali

Reputation: 561

You can use System.Text.RegularExpressions.Regex

See the following example

Regex regex = new Regex(@"^(?:[^_]+_){2}(\d+)"); //Expression from Jan's answer just showing how to use C# to achieve your goal
GroupCollection groups = regex.Match("XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx").Groups;
if (groups.Count > 1)
{
    Console.WriteLine(groups[1].Value);
}

Upvotes: -1

AJP
AJP

Reputation: 43

Try this one

MatchCollection matches = Regex.Matches(sInputLine, @"\d{8}");

string sSecond = matches[1].ToString();

Upvotes: 0

Related Questions