Reputation: 8780

When escaping characters in a string, is it faster to test for the need to do so first?

In cases where there aren't any characters to be escaped, would

if (s.Contains("\"")) 
    s = s.Replace("\"", "\"\"");

really run any faster than

s = s.Replace("\"", "\"\"");

The Contains method has to search through the string just like the Replace method does, and if the Replace method doesn't find anything to escape then I'd think it shouldn't really take any longer than the Contains method takes to look through the string. But if there is a character to escape then you're having to search through the string twice.

Upvotes: 0

Answers (7)

Feng Yuan

Reputation: 707

New version:

        string test = "A quick brown fox jumps over a lazy dog.";

        int count = 1000 * 1000;

        Stopwatch watch = new Stopwatch();

        for (int i = 0; i < 4; i++)
        {
            string result = String.Empty;

            watch.Restart();

            for (int c = 0; c < count; c++)
            {
                switch (i)
                {
                    case 0: // warmup
                        break;

                    case 1:
                        if (test.Contains("\""))
                        {
                            result = test.Replace("\"", "\"\"");
                        }
                        break;

                    case 2:
                        result = test.Replace("\"", "\"\"");
                        break;

                    case 3:
                        if (test.IndexOf('\"') >= 0)
                        {
                            result = test.Replace("\"", "\"\"");
                        }
                        break;
                }
            }

            watch.Stop();

            Console.WriteLine("Test {0,16} {1,7:N3} ms {2}", 
                new string[]{"Base","Contains-Replace","Replace","IndexOf-Replace"}[i],
                watch.Elapsed.TotalMilliseconds,
                result);

Result:

Test             Base   3.026 ms
Test Contains-Replace 284.780 ms
Test          Replace 214.503 ms
Test  IndexOf-Replace  64.447 ms

So Contains(string) itself is quite slow. The reason is due to NLS (nature language processing API. But IndexOf(char) is much faster.

Upvotes: 0

Feng Yuan

Reputation: 707

Andre Calil's micro-benchmark modified:

        List<string> StringList = new List<string>();

        for (int i = 0; i < 10000; i++)
        {
            StringList.Add(DateTime.Now.Ticks + " abc ");
        }

        string temp;

        KeyValuePair<string, string>[] StringsToReplace = new KeyValuePair<string, string>[6];

        // Use array to avoid dictionary access cost
        StringsToReplace[0] = new KeyValuePair<string,string>("1", ".1.");
        StringsToReplace[1] = new KeyValuePair<string,string>("a", "z");
        StringsToReplace[2] = new KeyValuePair<string,string>("b", "x");
        StringsToReplace[3] = new KeyValuePair<string,string>("c", "v");
        StringsToReplace[4] = new KeyValuePair<string,string>("d", "u");
        StringsToReplace[5] = new KeyValuePair<string,string>("e", "t");

        int TotalIterations = 100;
        Stopwatch stopWatch1 = new Stopwatch();
        Stopwatch stopWatch2 = new Stopwatch();

        GC.Collect(); // remove influence of garbage objects

        for (int j = 0; j <= TotalIterations; j++)
        {
            stopWatch1.Start(); // StopWatch Start/Stop does its own accumation

            for (int i = 0; i < StringList.Count; i++)
            {
                for (int k = 0; k < StringsToReplace.Length; k++)
                {
                    temp = StringList[i].Replace(StringsToReplace[k].Value, StringsToReplace[k].Key);
                }
            }

            stopWatch1.Stop();

            stopWatch2.Start();

            for (int i = 0; i < StringList.Count; i++)
            {
                for (int k = 0; k < StringsToReplace.Length; k++)
                {
                    if (StringList[i].Contains(StringsToReplace[k].Value))
                        temp = StringList[i].Replace(StringsToReplace[k].Value, StringsToReplace[k].Key);
                }
            }

            stopWatch2.Stop();

            if (j == 0) // discard first run, warm only
            {
                stopWatch1.Reset();
                stopWatch2.Reset();
            }
        }

        // Elapsed.TotalMilliseconds return in double, more accurate than ElapsedMilliseconds
        Console.WriteLine("Replace           : {0:N3} ms", stopWatch1.Elapsed.TotalMilliseconds / TotalIterations);
        Console.WriteLine("Contains > Replace: {0:N3} ms", stopWatch2.Elapsed.TotalMilliseconds / TotalIterations);

Results in favor of directly calling Replace because real replacement is needed 50% of time:

Replace           : 7.453 ms
Contains > Replace: 8.381 ms

Upvotes: 1

Phillip Schmidt

Reputation: 8818

If anyone is curious (you aren't, trust me), the equation to find out the necessary percentage of strings which need a replacement for the OP's method to be more efficient is (roughly) this:

Yeah, that's right.

Where:

C = Time it takes to create a new string
R = average Ratio of Delimiter position to string length
n = average string length
p = average position of the delimiter (quote in this case) in the string
t = time it takes to search one character of a string

So if your percentage of strings that have a quote in them is greater than this -- by all means, use the first method.

This doesn't account for variables I considered negligible, such as time to return, the actual replacement of the character (though you can really include that in C), or any kind of compiler optimization that I don't know about.

On a side note: the fact that I just spent half an hour solving the world's most useless equation is a perfect example of why I switched from Math to CS.

Upvotes: 0

Andre Calil

Reputation: 7692

String.Replace

This method performs an ordinal (case-sensitive and culture-insensitive) search to find oldValue.

String.Contains

This method performs an ordinal (case-sensitive and culture-insensitive) comparison. The search begins at the first character position of this string and continues through the last character position.

so, I'd say that there's no pratical difference. Still, I'll run a micro-benchmarking here.

Micro benchmarking (man, I love this)

The code:

        List<string> StringList = new List<string>();

        for (int i = 0; i < 10000; i++)
        {
            StringList.Add(DateTime.Now.Ticks + " abc ");
        }

        string temp;

        Dictionary<string, string> StringsToReplace = new Dictionary<string, string>();
        StringsToReplace.Add("1", ".1.");
        StringsToReplace.Add("a", "z");
        StringsToReplace.Add("b", "x");
        StringsToReplace.Add("c", "v");
        StringsToReplace.Add("d", "u");
        StringsToReplace.Add("e", "t");

        long ReplaceElapsedTime = 0;
        long ContainsReplaceElapsedTime = 0;

        int TotalIterations = 10000;

        for (int j = 0; j < TotalIterations; j++)
        {
            Stopwatch stopWatch = new Stopwatch();
            stopWatch.Start();

            for (int i = 0; i < StringList.Count; i++)
            {
                foreach (KeyValuePair<string, string> CurrentPair in StringsToReplace)
                {
                    temp = StringList[i].Replace(CurrentPair.Value, CurrentPair.Key);
                }
            }

            stopWatch.Stop();

            ReplaceElapsedTime += stopWatch.ElapsedMilliseconds;

            stopWatch.Reset();
            stopWatch.Start();

            for (int i = 0; i < StringList.Count; i++)
            {
                foreach (KeyValuePair<string, string> CurrentPair in StringsToReplace)
                {
                    if (StringList[i].Contains(CurrentPair.Value))
                        temp = StringList[i].Replace(CurrentPair.Value, CurrentPair.Key);
                }
            }

            stopWatch.Stop();

            ContainsReplaceElapsedTime += stopWatch.ElapsedMilliseconds;
        }

        Console.WriteLine(string.Format("Replace: {0} ms", ReplaceElapsedTime/TotalIterations));
        Console.WriteLine(string.Format("Contains > Replace: {0} ms", ContainsReplaceElapsedTime/TotalIterations));

        Console.ReadLine();

Results:

Replace: 14 ms 
Contains > Replace: 14 ms

Upvotes: 0

Guffa

Reputation: 700272

Some basic performance testing, doing each 10 000 000 times (with a string about 30 characters with a quotation mark somewhere in the middle), gives:

Contains("\"")          1327 ms.
Contains('"')           2949 ms.
Replace("\"", "\"\"")   2528 m.s

The Contains call takes about half the time of the Replace call. I thought that looking for a character would be faster than looking for a string, but surprisingly it isn't.

So, comparing the cases of checking first and always replacing to get the break even, where x is the time of the Contains call and y is how often you need to do the replacement:

x + 2xy = 2x

gives:

y = 0.5

This means that the two methods break even at about 50%. If you need to do the replacement more than half the time, there is no gain in checking for the existance first.

Upvotes: 4

Jon

Reputation: 437356

As with all performance-related questions, the answer is measure.

It depends on how intelligent Replace is (does it allocate any memory "in advance"? if so, how big a performance hit is that?), on the ratio of strings that need replacements over the total number of strings you process, on the lengths of the strings, which index the first " is located at (Contains is faster if it can return true sooner) etc.

Upvotes: 0

usr

Reputation: 171188

That depends on the ratio of strings needing fixup to all strings. If very little strings must be fixed this is certainly a win. The win will be very small though as Replace surely doesn't allocate a new string if it doesn't find any match. This is a common optimization.

The break-even point cannot be predicted, though. Measure.

Upvotes: 0

When escaping characters in a string, is it faster to test for the need to do so first?

Answers (7)

Related Questions