Matt
Matt

Reputation: 2682

Replace all alphanumeric characters in a string except pattern

I'm trying to obfuscate a string, but need to preserve a couple patterns. Basically, all alphanumeric characters need to be replaced with a single character (say 'X'), but the following (example) patterns need to be preserved (note that each pattern has a single space at the beginning)

I've looked through a few samples on negative lookahead/behinds, but still not haven't any luck with this (only testing QQQ).

var test = @"""SOME TEXT       AB123 12XYZ QQQ""""empty""""empty""1A2BCDEF";
var regex = new Regex(@"((?!QQQ)(?<!\sQ{1,3}))[0-9a-zA-Z]");            
var result = regex.Replace(test, "X");  

The correct result should be:

"XXXX XXXX       XXXXX XXXXX QQQ""XXXXX""XXXXX"XXXXXXXX

This works for an exact match, but will fail with something like ' QQR"', which returns

"XXXX XXXX       XXXXX XXXXX XQR""XXXXX""XXXXX"XXXXXXXX

Upvotes: 5

Views: 900

Answers (2)

Ondrej Janacek
Ondrej Janacek

Reputation: 12626

Here's a non-regex solution. Works quite nice, althought it fails when there is one pattern in an input sequence more then once. It would need a better algorithm fetching occurances. You can compare it with a regex solution for a large strings.

public static string ReplaceWithPatterns(this string input, IEnumerable<string> patterns, char replacement)
{
    var patternsPositions = patterns.Select(p => 
           new { Pattern = p, Index = input.IndexOf(p) })
           .Where(i => i.Index > 0);

    var result = new string(replacement, input.Length);
    if (!patternsPositions.Any()) // no pattern in the input
        return result;

    foreach(var p in patternsPositions)
        result = result.Insert(p.Index, p.Pattern); // return patterns back

    return result;
}

Upvotes: 2

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89629

You can use this:

var regex = new Regex(@"((?> QQQ|[^A-Za-z0-9]+)*)[A-Za-z0-9]");            
var result = regex.Replace(test, "$1X");

The idea is to match all that must be preserved first and to put it in a capturing group.

Since the target characters are always preceded by zero or more things that must be preserved, you only need to write this capturing group before [A-Za-z0-9]

Upvotes: 4

Related Questions