XandrUu
XandrUu

Reputation: 1179

Regular Expression Pattern \K Alternatives in C#

I have a regex expression that I tested on http://gskinner.com/RegExr/ and it worked, but when I used it in my C# application it failed.

My regex expression: (?<!\d)\d{6}\K\d+(?=\d{4}(?!\d)) Text: 4000751111115425 Result: 111111

What is wrong with my regex expression?

Upvotes: 9

Views: 4491

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627103

In general, \K operator (that discards all text matched so far from the match memory buffer) can be emulated with two techniques:

For example,

  • PCRE a+b+c+=\K\d+ (demo) = .NET (?<=a+b+c+=)\d+ or a+b+c+=(\d+) (and grab Group 1 value)
  • PCRE ^[^][]+\K.* (demo) = .NET (?<=^[^][]+)(?:\[.*)?$ (demo) or (better here) ^[^][]+(.*) (demo).

The problem with the second example is that [^][]+ can match the same text as .* (these patterns overlap) and since there is no clear boundary between the two patterns, just using a lookbehind is not actually working and needs additional tricks to make it work.

Capturing group approach is universal here and should work in all situations.

Since \K makes the regex engine "forget" the part of a match consumed so far, the best approach here is to use a capturing group to grab the part of a match you need to obtain after the left-hand context:

using System;
using System.Text.RegularExpressions;

public class Test
{
    public static void Main()
    {
        var text = "Text  4000751111115425";
        var result = Regex.Match(text, @"(?<!\d)\d{6}(\d+)(?=\d{4}(?!\d))")?.Groups[1].Value;
        Console.WriteLine($"Result: '{result}'");
    }
}

See the online C# demo and the regex demo (see Table tab for the proper result table). Details:

  • (?<!\d) - a left-hand digit boundary
  • \d{6} - six digits
  • (\d+) - Capturing group 1: one or more digits
  • (?=\d{4}(?!\d)) - a positive lookahead that matches a location that is immediately followed with four digits not immediately followed with another digit.

Upvotes: 0

Rawling
Rawling

Reputation: 50144

This issue you are having is that .NET regular expressions do not support \K, "discard what has been matched so far".

I believe your regex translates as "match any string of more than ten \d digits, to as many digits as possible, and discard the first 6 and the last 4".

I believe that the .NET-compliant regex

(?<=\d{6})\d+(?=\d{4})

achieves the same thing. Note that the negative lookahead/behind for no-more-\ds is not necessary as the \d+ is greedy - the engine already will try to match as many digits as possible.

Upvotes: 8

Related Questions