Reputation: 30922
I had a thought before when comparing two strings with their variables:
string str1 = "foofoo";
string strFoo = "foo";
string str2 = strFoo + strFoo;
// Even thought str1 and str2 reference 2 different
//objects the following assertion is true.
Debug.Assert(str1 == str2);
Is this purely because the .NET runtime recognises the string's value is the same and because strings are immutable makes the reference of str2
equal to that of str1
?
So when we do str1 == str2
we are actually comparing references and not the values? I originally thought this was the product of syntactic sugar, but was I being incorrect?
Any inaccuracies with what I've written?
Upvotes: 13
Views: 1467
Reputation: 116471
If we take a look at the jitted code, we'll see that str2
is assembled using String.Concat
and that it in fact is not the same reference as str1
. We will also see that the comparison is done using Equals
. In other words the assert passes as the strings contain the same characters.
This code
static void Main(string[] args)
{
string str1 = "foofoo";
string strFoo = "foo";
string str2 = strFoo + strFoo;
Console.WriteLine(str1 == str2);
Debugger.Break();
}
is jitted to (please scroll sideways to see comments)
C:\dev\sandbox\cs-console\Program.cs @ 22:
00340070 55 push ebp
00340071 8bec mov ebp,esp
00340073 56 push esi
00340074 8b3530206003 mov esi,dword ptr ds:[3602030h] ("foofoo") <-- Note address of "foofoo"
C:\dev\sandbox\cs-console\Program.cs @ 23:
0034007a 8b0d34206003 mov ecx,dword ptr ds:[3602034h] ("foo") <-- Note different address for "foo"
C:\dev\sandbox\cs-console\Program.cs @ 24:
00340080 8bd1 mov edx,ecx
00340082 e81977fe6c call mscorlib_ni+0x2b77a0 (6d3277a0) (System.String.Concat(System.String, System.String), mdToken: 0600035f) <-- Call String.Concat to assemble str2
00340087 8bd0 mov edx,eax
00340089 8bce mov ecx,esi
0034008b e870ebfd6c call mscorlib_ni+0x2aec00 (6d31ec00) (System.String.Equals(System.String, System.String), mdToken: 060002d2) <-- Compare using String.Equals
00340090 0fb6f0 movzx esi,al
00340093 e83870f86c call mscorlib_ni+0x2570d0 (6d2c70d0) (System.Console.get_Out(), mdToken: 060008fd)
00340098 8bc8 mov ecx,eax
0034009a 8bd6 mov edx,esi
0034009c 8b01 mov eax,dword ptr [ecx]
0034009e 8b4038 mov eax,dword ptr [eax+38h]
003400a1 ff5010 call dword ptr [eax+10h]
C:\dev\sandbox\cs-console\Program.cs @ 28:
003400a4 e87775596d call mscorlib_ni+0x867620 (6d8d7620) (System.Diagnostics.Debugger.Break(), mdToken: 0600239a)
C:\dev\sandbox\cs-console\Program.cs @ 29:
>>> 003400a9 5e pop esi
003400aa 5d pop ebp
003400ab c3 ret
Upvotes: 7
Reputation: 113352
In the order in which your code hits it...
==
is overridden. This means that rather than "abc" == "ab" + "c"
calling the default ==
for reference types (which compares references and not values) it calls into string.Equals(a, b)
.
Now, this does the following:
In other words, it starts with something like:
public static bool ==(string x, string y)
{
//step 1:
if(ReferenceEquals(x, y))
return true;
//step 2:
if(ReferenceEquals(x, null) || ReferenceEquals(y, null))
return false;
//step 3;
int len = x.Length;
if(len != y.Length)
return false;
//step 4:
for(int i = 0; i != len; ++i)
if(x[i] != y[i])
return false;
return true;
}
Except that step 4 is a pointer-based version with an unrolled loop that should hence ideally be faster. I won't show that because I want to talk about the overall logic.
There are significant short-cuts. The first is in step 1. Since equality is reflexive (identity entails equality, a == a
) then we can return true in nanoseconds for even a string several MB in size, if compared with itself.
Step 2 isn't a short-cut, because its a condition that must be tested for, but note that because we'll have already have returned true for (string)null == (string)null
we don't need another branch. So the order of calling is geared to a quick result.
Step 3 allows two things. It both short-cuts on strings of different length (always false) and means that one cannot accidentally shoot past the end of one of the strings being compared in step 4.
Note that this is not the case for other string comparisons, since e.g. WEISSBIER
and weißbier
are different lengths but the same word in different capitalisation, so case-insensitive comparison cannot use step 3. All equality comparisons can do step 1 and 2 as the rules used always hold, so you should use them in your own, only some can do step 3.
Hence, while you are wrong in suggesting that it is references rather than values that are compared, it is true that references are compared first as a very significant short-cut. Note also that interned strings (strings placed in the intern pool by compilation or by string.Intern
called) will hence trigger this short-cut often. This would be the case in the code in your example, as the compiler will have used the same reference in each case.
If you know that a string was interned you can depend upon this (just do reference equality test), but even if you don't know for sure you can benefit from it (reference equality test will short-cut at least some of the time).
If you have a bunch of strings where you will want to test some of them against each other often, but you don't want to extend their lifetime in memory as much as interning does, then you could use an XmlNameTable or LockFreeAtomizer (soon to be renamed ThreadSafeAtomizer and the doc moved to http://hackcraft.github.com/Ariadne/documentation/html/T_Ariadne_ThreadSafeAtomizer_1.htm - should have been named for function rather than implementation details in the first place).
The former is used internally by XmlTextReader
and hence by a lot of the rest of System.Xml
and can be used by other code too. The latter I wrote because I wanted a similar idea, that was safe for concurrent calls, for different types, and where I could override the equality comparison.
In either case, if you put 50 different strings that are all "abc" into it, you'll get a single "abc" reference back allowing the others to be garbage collected. If you know this has happened you can depend upon ReferenceEquals
alone, and if you're not sure, you'll still benefit from the short-cut when it is the case.
Upvotes: 1
Reputation: 1569
According to the msdn (http://msdn.microsoft.com/en-us/library/53k8ybth.aspx):
For predefined value types, the equality operator (==) returns true if the values of its operands are equal, false otherwise. For reference types other than string, == returns true if its two operands refer to the same object. For the string type, == compares the values of the strings.
Upvotes: 0
Reputation: 46394
The reference equality operator ==
can be overridden; and in the case of System.String
it is overridden to use value-equality behavior. For true reference-equality you can use the Object.ReferenceEquals()
method, which cannot be overridden.
Upvotes: 2
Reputation: 51369
No.
== works because the String class overloads the == operator to be equivalent to the Equals method.
From Reflector:
[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
public static bool operator ==(string a, string b)
{
return Equals(a, b);
}
Upvotes: 10
Reputation: 62157
Is this purely because the .NET runtime recognises the string's value is the same and because strings are immutable makes the reference of str2 equal to that of str1?
No. FIrst, it is because str1 and str2 ARE identical - they are the same string becauset he compiler can optimize that out. strFoo + strFoo is a compile time constant itendical to str1. As strings are INTERNED in classes they use the same string.
Second, string OVERRIDES tthe == method. CHeck the source code from the reference sources available on the internet for some time.
Upvotes: 2
Reputation: 52798
The answer is in the C# Spec §7.10.7
The string equality operators compare string values rather than string references. When two separate string instances contain the exact same sequence of characters, the values of the strings are equal, but the references are different. As described in §7.10.6, the reference type equality operators can be used to compare string references instead of string values.
Upvotes: 14
Reputation: 46977
Actually, String.Equals
first checks if it is the same reference and if not compares the content.
Upvotes: 7