CoolCodeBro
CoolCodeBro

Reputation: 779

c# Fastest way to compare strings

I've noticed that

string1.Length == string2.Length && string1 == string2

on large strings is slightly faster than just

string1 == string2

Is this true? And is this a good practice to compare large string lengths before comparing actual strings?

Upvotes: 27

Views: 74444

Answers (9)

Free Coder 24
Free Coder 24

Reputation: 1043

For the geeks among us, here's a page which does a great job at benchmarking numerous ways to compare strings.

In a nutshell, the fastest method appears to be the CompareOrdinal:

if (string.CompareOrdinal(stringsWeWantToSeeIfMatches[x], stringsWeAreComparingAgainst[x]) == 0)
{
//they're equal
}

The second best way seems to be using either a Dictionary or Hashset with the "key" as the string you want to compare.

Makes for an interesting read.

Upvotes: 10

Habib
Habib

Reputation: 223187

String.Equality Operator or == internally calls string.Equals, so use string.Equals or == provided by the framework. It is already optimized enough.

It first compare references, then length and then actual characters.

You can find the source code here

Code: (Source: http://www.dotnetframework.org/default.aspx/4@0/4@0/DEVDIV_TFS/Dev10/Releases/RTMRel/ndp/clr/src/BCL/System/String@cs/1305376/String@cs)

// Determines whether two strings match.
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
public override bool Equals(Object obj) {
    if (this == null)                        //this is necessary to guard against reverse-pinvokes and
        throw new NullReferenceException();  //other callers who do not use the callvirt instruction

    String str = obj as String;
    if (str == null)
        return false;

    if (Object.ReferenceEquals(this, obj))
        return true;

    return EqualsHelper(this, str);
}

and

[System.Security.SecuritySafeCritical]  // auto-generated
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
private unsafe static bool EqualsHelper(String strA, String strB)
{
    Contract.Requires(strA != null);
    Contract.Requires(strB != null);
    int length = strA.Length;
    if (length != strB.Length) return false;

    fixed (char* ap = &strA.m_firstChar) fixed (char* bp = &strB.m_firstChar)
    {
        char* a = ap;
        char* b = bp;

        // unroll the loop
#if AMD64
        // for AMD64 bit platform we unroll by 12 and
        // check 3 qword at a time. This is less code
        // than the 32 bit case and is shorter
        // pathlength

        while (length >= 12)
        {
            if (*(long*)a     != *(long*)b) break;
            if (*(long*)(a+4) != *(long*)(b+4)) break;
            if (*(long*)(a+8) != *(long*)(b+8)) break;
            a += 12; b += 12; length -= 12;
        }
 #else
        while (length >= 10)
        {
            if (*(int*)a != *(int*)b) break;
            if (*(int*)(a+2) != *(int*)(b+2)) break;
            if (*(int*)(a+4) != *(int*)(b+4)) break;
            if (*(int*)(a+6) != *(int*)(b+6)) break;
            if (*(int*)(a+8) != *(int*)(b+8)) break;
            a += 10; b += 10; length -= 10;
        }
  #endif

        // This depends on the fact that the String objects are
        // always zero terminated and that the terminating zero is not included
        // in the length. For odd string sizes, the last compare will include
        // the zero terminator.
        while (length > 0)
        {
            if (*(int*)a != *(int*)b) break;
            a += 2; b += 2; length -= 2;
        }

        return (length <= 0);
    }
}

Upvotes: 19

Ricardo Pieper
Ricardo Pieper

Reputation: 121

If you expect the strings to be different in their lenghts in most of the time, you can compare their lenghts AND then compare the strings itself by using string.Compare. I got almost 50% performance improvement by doing this:

if (str1.Length == str2.Length)
{
    if (string.Compare(str1, str2, StringComparison.Ordinal) == 0)
    {
       doSomething()
    }
}

In this case, I expect the strings to be different almost all the time, I think str1.Lenght is way cheaper than comparing the actual strings. If they are equal in size, I compare them.

EDIT: Forget what I said. Just use == and be happy.

Upvotes: -1

Napivo
Napivo

Reputation: 593

My test results

Compare 10000 strings to 10000 other strings all the same length (256)

Time (s1 == s2): 32536889 ticks

Time (s1.Length == s2.Length) && (s1 == s2): 37380529 ticks

Compare 10000 strings to 10000 other strings Random length max 256

Time (s1 == s2): 27223517 ticks

Time (s1.Length == s2.Length) && (s1 == s2): 23419529 ticks

Compare 10000 strings to 10000 other strings Random length min 256 max 512

Time (s1 == s2): 28904898 ticks

Time (s1.Length == s2.Length) && (s1 == s2): 25442710 ticks

What I find mind boggling is that a compare of 10000 equal length strings will take longer than comparing the same amount of data that is larger.

All these test have been done with exactly the same data.

Upvotes: 7

Misha Zaslavsky
Misha Zaslavsky

Reputation: 9616

So as I promised I wrote a short code with a stopwatch - you can copy paste it and try on different strings and see the differences

class Program
{
    static void Main(string[] args)
    {
        string str1 = "put the first value";
        string str2 = "put the second value";
        CompareTwoStringsWithStopWatch(str1, str2); //Print the results.
    }

    private static void CompareTwoStringsWithStopWatch(string str1, string str2)
    {
        Stopwatch stopwatch = new Stopwatch();

        stopwatch.Start();
        for (int i = 0; i < 99999999; i++)
        {
            if (str1.Length == str2.Length && str1 == str2)
            {
                SomeOperation();
            }
        }
        stopwatch.Stop();

        Console.WriteLine("{0}. Time: {1}", "Result for: str1.Length == str2.Length && str1 == str2", stopwatch.Elapsed);
        stopwatch.Reset();

        stopwatch.Start();
        for (int i = 0; i < 99999999; i++)
        {
            if (str1 == str2)
            {
                SomeOperation();
            }
        }
        stopwatch.Stop();

        Console.WriteLine("{0}. Time: {1}", "Result for: str1 == str2", stopwatch.Elapsed);
    }

    private static int SomeOperation()
    {
        var value = 500;
        value += 5;

        return value - 300;
    }
}

My conclusions:

  1. As I checked some strings (short ones and long ones) I saw that all the results are almost the same. So the first if (with the length check) is slower in 2/3.
  2. And you have an Equals method in the Object class, just use it :)
  3. You can try it and give us the results also :)

Upvotes: 0

p.s.w.g
p.s.w.g

Reputation: 148980

According ILSpy, the string == operator is defined as:

public static bool operator ==(string a, string b)
{
    return string.Equals(a, b);
}

Which is defined as

public static bool Equals(string a, string b)
{
    return a == b || (a != null && b != null && a.Length == b.Length && string.EqualsHelper(a, b));
}

I assume that first a == b is actually a reference equality check (ILSpy is just rendering it as ==), otherwise this would be an infinitely recursive method.

This means that == already checks the lengths of the strings before actually comparing their characters.

Upvotes: 4

usr
usr

Reputation: 171178

strings operator equals does the length check before comparing the chars. So you do not save the comparison of the contents with this trick. You might still save a few CPU cycles because your length check assumes that the strings are not null, while the BCL must check that. So if the lengths are not equal most of the time, you will short-circuit a few instructions.

I might just be wrong here, though. Maybe the operator gets inlined and the checks optimized out. Who knows for sure? (He who measures.)

If you care about saving every cycle you can you should probably use a different strategy in the first place. Maybe managed code is not even the right choice. Given that, I recommend to use the shorter form and not use the additional check.

Upvotes: 28

Ben Voigt
Ben Voigt

Reputation: 283614

In terminated strings, it makes sense to just start comparing characters, since you can't calculate the string lengths without iterating all characters anyway, and the comparison is likely to early exit.

With length-counted strings, comparing the length should be done first, if you are testing for byte-wise equality. You can't even start accessing character data without retrieving the length, since one could be zero-length.

If you are doing a relational comparison, knowing the lengths are different doesn't tell you if the result should be positive or negative. And in a culture-aware comparison, equal strings do not imply equal lengths. So for both of those you need to just compare data.

If operator==(string, string) simply delegates to a relational comparison, you wouldn't expect that to compare lengths. Checking length before doing the comparison could therefore be a benefit. But it seems like the Framework does start with a length check.

Upvotes: 3

JLe
JLe

Reputation: 2904

I'd say the first one is faster is the result of string1.Length == string2.Length is false. Thanks to Short Circuit Evalution (SCE) the actual comparision between the strings is then not made, which might save you time.

If the strings are equal however, the first one is slower since it will check the length first and then do the same thing as the second one.

See http://msdn.microsoft.com/en-us/library/2a723cdk.aspx for information about the && operator and SCE.

Upvotes: 0

Related Questions