Daniel Hollinrake
Daniel Hollinrake

Reputation: 1778

Using Regex Replace when looking for un-escaped characters

I've got a requirement that is basically this. If I have a string of text such as

"There once was an 'ugly' duckling but it could 
never have been \'Scarlett\' Johansen"

then I'd like to match the quotes that haven't already been escaped. These would be the ones around 'ugly' not the ones around 'Scarlett'.

I've spent quite a while on this using a little C# console app to test things and have come up with the following solution.

private static void RegexFunAndGames() {

  string result;
  string sampleText = @"Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief' but not in 'Stardust' because they'd stopped acting by then";
  string rePattern = @"\\'";
  string replaceWith = "'";

  Console.WriteLine(sampleText);

  Regex regEx = new Regex(rePattern);
  result = regEx.Replace(sampleText, replaceWith);

  result = result.Replace("'", @"\'");

  Console.WriteLine(result);
}

Basically what I've done is a two step process find those characters that have already been escaped, undo that then do everything again. It sounds a bit clumsy and I feel that there could be a better way.

Testing Information

I got two really good answers so I thought it worth running a test to see which runs better. I have these two functions:

    private static string RegexReplace(string sampleText) {
        Regex regEx = new Regex("(?<!\\\\)'");
        return regEx.Replace(sampleText, "\\'");           
    }

    private static string ReplaceTest(string sampleText) {
        return sampleText.Replace(@"\'", "'").Replace("'", @"\'");
    }

And I call them in via the Main method in a console app:

    static void Main(string[] args) {

        string sampleText = @"Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief'  but not in 'Stardust' because they'd stopped acting by then.";
        string testReplace = string.Empty;
        System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();

        sw.Start();
        for (int i = 1000000; i > 0; i--) {
            testReplace = ReplaceTest(sampleText);
        }

        sw.Stop();
        Console.WriteLine("This method took '" + sw.ElapsedMilliseconds.ToString() + "'");

        sw.Reset();
        sw.Start();
        for (int i = 1000000; i > 0; i--) {
            testReplace = RegexReplace(sampleText);
        }

        sw.Stop();
        Console.WriteLine("This method took '" + sw.ElapsedMilliseconds.ToString() + "'");
}

The method ReplaceTest takes 2068 milliseconds. The method RegexReplace takes 9372 milliseconds. I've ran this test a few times and ReplaceTest always comes out fastest.

Upvotes: 4

Views: 2012

Answers (3)

NeverHopeless
NeverHopeless

Reputation: 11233

I am surprised why you are using RegEx to do this why not simply use:

string result = sampleText.Replace(@"\'", "'").Replace("'", @"\'");

This will escape all the unescaped '.

It will first make all escaped '(single quote) unescaped, then will escape all.

Well, if RegEx is the requirement, you are going to accept the correct solution as you have already told.

Upvotes: 3

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726579

You can use a negative lookbehind to make sure that the quote is not escaped: the expression below

(?<!\\)'

matches a single quote unless it is immediately preceded by a slash.

Note that slashes that go into string constants must be doubled.

var sampleText = @"Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief' but not in 'Stardust' because they'd stopped acting by then";
var regEx = new Regex("(?<!\\\\)'");
var result = regEx.Replace(sampleText, "\\'");
Console.WriteLine(result);

The above prints

Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief\' but not in \'Stardust\' because they\'d stopped acting by then

Link to ideone.

Upvotes: 4

Fernando Martinez
Fernando Martinez

Reputation: 197

You could use

    string rePattern = @"[\\'|\']"; 

Instead

Upvotes: -1

Related Questions