Royi Namir
Royi Namir

Reputation: 148734

Empty string as a special case?

I read Jon Skeet's quiz and I wondered why the second sample of mine won't work while the first one does.

Why does this yield true :

object x = new string("".ToArray());
object y = new string("".ToArray());
Console.WriteLine(x == y); //true

But this one does not:

var k="k";
//string.intern(k); // doesn't help
object x = new string(k.ToArray());
object y = new string(k.ToArray());
Console.WriteLine(x == y); //false

I'm using fw 4.5 with vs2010.

Luckily I also have vs2005 installed , same results :

enter image description here

Upvotes: 56

Views: 2871

Answers (7)

MarcinJuraszek
MarcinJuraszek

Reputation: 125660

Here is a blog post by Eric Lippert which answers your question: String interning and String.Empty.

He's describing similar situation:

object obj = "Int32";
string str1 = "Int32";
string str2 = typeof(int).Name;
Console.WriteLine(obj == str1); // true
Console.WriteLine(str1 == str2); // true
Console.WriteLine(obj == str2); // false !?

So the idea is, that interning does not mean you'll have only one instance of particular string, even when it's interned. Only compile time literals are interned by default. It means that following code prints true:

var k1 = "k";
object k2 = "k";
Console.WriteLine(k1 == k2);

But, if you try to create string with "k" content programmatically at runtime, e.g. using string(char[]) constructor, calling ToString() on an object, using StringBuilder, etc, you won't get interned string by default. This one prints false;

var k1 = "k";
object k2 = new string("k".ToCharArray());
Console.WriteLine(k1 == k2);

Why? Because interning strings at runtime is expensive.

There Ain't No Such Thing As A Free Lunch.

(...)

In short, it is in the general case not worth it to intern all strings.

And about different behavior with empty string:

Some versions of the .NET runtime automatically intern the empty string at runtime, some do not!

Upvotes: 44

Pavel Zhuravlev
Pavel Zhuravlev

Reputation: 2791

The first case compares 2 references to the same object (String.Empty). Calling operator== for 2 object variables causes their comparance by reference and gives true.

The second case produces 2 different instances of string class. Their reference comparison gives false

If you give string type to x and y in the second case the string.operator== override will be called and the comparison gives true

Note that we don't deal with the string interning directly in both cases. The string objects which we compare are created using string(char[]) constructor. Apparently that constructor is designed to return the value of the string.Empty field when called with an empty array as an argument.

The answer posted by MarcinJuraszek referes to the Lippert's blog which discusses string interning. That blog post discusses other corner case of string class usage. Consider this example from the forementioned Lippert's blog:

object obj = "";
string str1 = "";
string str2 = String.Empty;
Console.WriteLine(obj == str1); // true
Console.WriteLine(str1 == str2); // true
Console.WriteLine(obj == str2); // sometimes true, sometimes false?!

What we see here is that the assignment from the empty string literal ("") is not guaranteed to produce the reference to the static readonly System.String.Empty field.

Let's look at the IL for the object x = new string("".ToArray()); expression:

IL_0001:  ldstr      ""
IL_0006:  call       !!0[] [System.Core]System.Linq.Enumerable::ToArray<char>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>)
IL_000b:  newobj     instance void [mscorlib]System.String::.ctor(char[])
IL_0010:  stloc.0

The interning may (or may not) happen at the IL_0001 line. Whether the literal is interned or not, the ToArray() method produces a new empty array and the String::.ctor(char[]) gives us String.Empty.

What we see here is not the special case of string.Empty but rather is one of the side effects of the string class being reference type and immutable at the same time. There are other immutable framework types which have predefined values with similar semantics (like DateTime.MinValue). But as far as I know such framework types are defined as struct unlike the string which is a reference type. The value types are totally different story... It does not make sense to return some fixed predefined type instance from a mutable class constructor (the calling code will be able to change that instance and cause the unpredictable behavior of the type). So the reference types whose constructors do not always return new instances may exist provided that those types are immutable. I am not aware of other such types in the framework though, except the string.

Upvotes: 6

Pedro.The.Kid
Pedro.The.Kid

Reputation: 2078

There is a special case where empty strings always return the same object and this is why when you compare if the object is the same in this case its true.

[Edit]: previous code was using string comparator instead of object

object a = "s";
object b = "d";

a = ((string)a).Replace("s", "");
b = ((string)b).Replace("d", "");

Console.WriteLine(a == b);

object c = "sa";
object d = "da";

c = ((string)c).Replace("s", "");
d = ((string)d).Replace("d", "");

Console.WriteLine(c == d);

c = ((string)c).Replace("a", "");
d = ((string)d).Replace("a", "");

Console.WriteLine(c == d);

result

True
False
True

Upvotes: 2

BenM
BenM

Reputation: 529

Note that interning the new strings in the second block of code does make them equal.

var k="k";
object x = string.Intern(new string(k.ToArray()));
object y = string.Intern(new string(k.ToArray()));
Console.WriteLine(x == y); //true

It seems like it's interning the empty strings automatically, but non-empty strings aren't interned unless they're done explicitly (or they're literal strings which are always interned).

I'm guessing that yes, empty strings are being treated as a special case and being interned automatically, probably because the check is so trivial that it doesn't add any real performance penalty (we can safely say that ANY string of length 0 is the empty string and is identical to any other empty string -- all other strings require us to look at the characters and not just the length).

Upvotes: 11

Dgan
Dgan

Reputation: 10295

I think This can be the reason I refer Jon Skeet Answer about String Comparison

Are string.Equals() and == operator really same?

        object x1 = new StringBuilder("").ToString().ToArray();
        object y1 = new StringBuilder("").ToString().ToArray();
        Console.WriteLine(x1 == y1); //true

        Console.WriteLine("Address x1:" + Get(x1));
        Console.WriteLine("Address y1:" + Get(y1));

        var k = "k";
        //string.intern(k); // doesn't help
        object x = new string(k.ToArray());
        object y = new string(k.ToArray());
        Console.WriteLine(x == y); //false

        Console.WriteLine("Address x:" + Get(x));
        Console.WriteLine("Address y:" + Get(y));

        Console.Read(); 

Output

False
Address x1:0x2613E5
Address y1:0x2613E5
False
Address x:0x2613E5
Address y:0x2613E5

Upvotes: 2

Kind Contributor
Kind Contributor

Reputation: 18591

According to http://msdn.microsoft.com/en-us/library/system.string.intern(v=vs.110).aspx

In the .NET Framework 3.5 Service Pack 1, the Intern method reverts to its behavior in the .NET Framework 1.0 and 1.1 with regard to interning the empty string...

...In the .NET Framework 1.0, .NET Framework 1.1, and .NET Framework 3.5 SP1, ~empty strings~ are equal

This means, empty strings are both interned by default, even when constructing from an empty array, and are therefore equal.

Furthermore:

The .NET Framework version 2.0 introduces the CompilationRelaxations.NoStringInterning enumeration member

This most likely provides you a way to create a consistent way to compare, although as @BenM suggests, you would rather explicitly use the Intern function.

Given the boxing that occurs, you could also use string.Equals instead of ==

Upvotes: 2

Matthew
Matthew

Reputation: 25793

My hypothesis is why the first one yields true while the 2nd yields false:

The first result my be an optimization, take the following code code

Enumerable.Empty<char>() == Enumerable.Empty<char>() // true

So, suppose the ToArray method returns Enumerable.Empty<char>() when the string is empty, this would explain why the first result yields true and the 2nd doesn't, as it's doing a reference check.

Upvotes: 4

Related Questions