JA1
JA1

Reputation: 568

C# Regex to extract after capture group numbers only

I'm not sure what I'm doing wrong. I have the following:

(?:[A-Z]{2}\d{2}\s)

This is because my string always starts with two upper alpha characters and 2 numeric. Afterwards I have data that is mixed with words and I only want the numbers I want to take this AB12 (1,2,3 words, 4,5,6,7,8,9) and obtain this AB12 (1,2,3,4,5,6,7,8,9)

I was trying

(?:[A-Z]{2}\d{2}\s)([0-9]+)

however this is not working. Was I even close in achieving my goal?

Upvotes: 1

Views: 56

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626758

To remove any character that is not a digit and a comma, you can use the [^,\d\s] character class, and use (?<=\([^()]*) and (?=[^()]*\)) lookarounds to assert the position inside parentheses:

(?<=\([^()]*)\s*[^,\d]+(?=[^()]*\))

See the regex demo

The \s* helps get rid of optional (0+) whitespaces before non-numerical values.

If you need to precise the context with your initial subpattern, add it:

(?<=^[A-Z]{2}\d{2}\s+\([^()]*)\s*[^,\d]+(?=[^()]*\))
    ^^^^^^^^^^^^^^^^^

A C# demo:

using System;
using System.IO;
using System.Text.RegularExpressions;

public class Test
{
    public static void Main()
    {
        var str = "AB12 (1,2,3 words, 4,5,6,7,8,9)";
        var pat = @"(?<=^[A-Z]{2}\d{2}\s+\([^()]*)\s*[^,\d]+(?=[^()]*\))";
        var res = Regex.Replace(str, pat, string.Empty);
        Console.WriteLine(res);
    }
}

Upvotes: 1

Related Questions