Kamil Turowski
Kamil Turowski

Reputation: 465

Regex to match alphanumeric except specific substring

Edit: MANDATORY CONDITION: Regex has to be inserted into the following statement:

Regex regex = new Regex("<REGEX_STRING>");
val= regex.Matches(val).Cast<Match>().Aggregate("", (s, e) => s + e.Value, s => s);

I found out that I can't use Regex.Replace() method as it was suggested in the answer below.

I am looking for a RegEx that would have to follow two conditions:

  1. accept only a-z, A-Z, 0-9, \s (one or more), and ignore _ (that's why \w is not an option)

  2. [!] exclude any {sq} "substring" anywhere inside the string

*{sq} - it's literally this 4-chars string, not any shortcut for ASCII sign !


What I have so far is:

\b(?!sq)[a-zA-Z0-9 ]*

but this RegEx cuts everything when _ shows up + it also excludes i.e whole [sq]. So for example for a given string:

test[sq]uirrel{sq}_things I should get testsquirrelthings and what I get is: testuirrel


Small input | expected output table below:

Input string Expected output
Na#me Name
M2a_ny M2any
Vari{sq}o@us Various
test [sq]uirrel h23ere! test squirrel h23ere

I would really appreciate any help, it's the most complicated RegEx I have ever came across 🙄

Upvotes: 4

Views: 365

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626690

The problem is that it is not possible in .NET regex to match any text but a multicharacter sequence.

You will have to use a terrible workaround like

((?:(?!{sq})[A-Za-z0-9\s])+)|{sq}

and you will need to get Group 1 values. See the .NET regex demo. Here is a C# demo:

var texts = new List<string> { "Na#me","M2a_ny","Vari{sq}o@us","test [sq]uirrel h23ere!" };
var pattern = @"((?:(?!{sq})[A-Za-z0-9\s])+)|{sq}";
    foreach (var text in texts) {
    var result = Regex.Matches(text, pattern).Cast<Match>()
            .Aggregate("", (s, e) => s + e.Groups[1].Value, s => s);
    Console.WriteLine(result);
}
// => Name, M2any, Various, test squirrel h23ere

A better, Regex.Replace based solution
You can remove {sq} and all non-letter and non-whitespace chars using

Regex.Replace(text, @"{sq}|[^a-zA-Z0-9\s]", "")
Regex.Replace(text, @"{sq}|[^\p{L}\p{N}\s]", "")

The \p{L} / \p{N} version can be used to support any Unicode letters/digits.

See the .NET regex demo:

enter image description here

Upvotes: 3

Related Questions