Reputation: 3795

C#: Why does .ToString() append text faster to an int converted to string?

This is from C# in a nutshell book

StringBuilder sb = new StringBuilder();
for(int i = 0; i < 50; i++) 
     sb.Append (i + ",");

//Outputs 0,1,2,3.............49,

However , it then says "the expression i + "," means that we are still repeatedly concatenating strings, howver this only incurs a small performance cost as strings are small"

Then it says that changing it to the lines below makes it faster

for(int i = 0; i < 50; i++) {
    sb.Append(i.ToString()); 
    sb.Append(",");
}

But why is that faster? Now we have an extra step where i is being converted to a string? What is actually going under the hood here?There isn't any more explanation in the rest of the chapter.

Upvotes: 7

Answers (3)

Gjeltema

Reputation: 4166

The first two answers to your question are not quite correct. The sb.Append(i + ","); statement does not call i.ToString(), what it actually does is

StringBuilder.Append(string.Concat((object)i, (object)","));

Internally in the string.Concat function, it calls ToString() on the two objects passed in. The key performance concern in this statement is (object)i. This is boxing - wrapping a value type inside a reference. This is a (relatively) sizable performance hit, as it takes extra cycles and memory allocation to box something, and then there's extra garbage collection required.

You can see this happening in the IL of the (Release) compiled code:

IL_000c:  box        [mscorlib]System.Int32
IL_0011:  ldstr      ","
IL_0016:  call       string [mscorlib]System.String::Concat(object,
                                                            object)
IL_001b:  callvirt   instance class [mscorlib]System.Text.StringBuilder 
                     [mscorlib]System.Text.StringBuilder::Append(string)

See that the first line is a box call, followed by a Concat call, ending with finally calling Append.

If you call i.ToString() instead, shown below, you forego the boxing, and also the string.Concat() call.

for (int i = 0; i < 50; i++)
{
    sb.Append(i.ToString());
    sb.Append(",");
}

This call yields the following IL:

IL_000b:  ldloca.s   i
IL_000d:  call       instance string [mscorlib]System.Int32::ToString()
IL_0012:  callvirt   instance class [mscorlib]System.Text.StringBuilder
                     [mscorlib]System.Text.StringBuilder::Append(string)
IL_0017:  pop
IL_0018:  ldloc.0
IL_0019:  ldstr      ","
IL_001e:  callvirt   instance class [mscorlib]System.Text.StringBuilder
                     [mscorlib]System.Text.StringBuilder::Append(string)

Note that there is no boxing, and no String.Concat, therefore there is less resources created that need to be collected, and less cycles wasted on boxing, at the cost of adding one Append() call, which is relatively much cheaper.

This is why the second set of code is better performance.

You can extend this idea to many other things - anywhere that's operating on strings that you're passing a value type into a function that isn't explicitly taking that type as an argument (calls that take an object as an argument, like string.Format() for example), it's a good idea to call <valuetype>.ToString() when passing in a value type argument.

In response to Theodoros' question in the comment:

The compiler team certainly could have decided to do such an optimization, but my guess is that they decided that the cost (in terms of additional complexity, time, additional testing, etc.) made the value of such a change not worth the investment.

Basically, they would have had to put in a special case branching for functions that ostensibly operate on strings, but offer an overload with object in it (basically, if (boxing occurs && overload has string)). Inside that branch the compiler would have to also check to verify that the object function overload does the same things as the string overload with the exception of calling ToString() on the arguments - it needs to do this because a user could create function overloads in which one function takes a string and another takes an object, but the two overloads perform different work on the arguments.

This seems to me like a lot of complexity and analysis for making a minor optimization to a few string manipulation functions. Additionally, this would be mucking around with the core compiler function resolution code, which already has some very exact rules that people misunderstand all the time (take a look at a number of Eric Lippert's answers - quite a few revolve around function resolution issues). Making it more complicated with "it works like this, except when you have that situation" type rules is certainly something to be avoided if the return is minimal.

The less expensive and less complex solution is to use the base function resolution rules, and let the compiler resolve you passing in a value type (like an int) into a function, and having it figure out that the only function signature that fits it is one that takes object, and do a box. Then rely on users to do the optimization of ToString() when they profile their code and determine it is necessary (or just know about this behavior and do it all the time anyway when they encounter the situation, which I do).

A more likely alternative they could have done is have a number of string.Concat overloads that take ints, doubles, etc. (like string.Concat(int, int)) and just call ToString on the arguments internally where they would not be boxed. This has the advantage that the optimization is in the class library instead of the compiler, but then you inevitably run into situations where you want to mix types in the concatenation, like the original question here where you have string.Concat(int, string). The permutations would explode, which is the likely reason they did not do so. They also could have determined the most commonly used situations where such overloads would be used and do the top 5, but I'm guessing they decided that would just open them up to people asking "well, you did (int, string), why don't you do (string, int)?".

Upvotes: 15

JBrooks

Reputation: 10013

When you do the following:

string x = "abc";
x = x + "d";     // or even x += "d";

the second line actual ends up abandoning the first string valued with "abc" and creates a new string for x="abcd"; I think that is the performance hit you are seeing.

Upvotes: 0

Jon

Reputation: 437554

Now we have an extra step where i is being converted to a string?

It's not an extra step. Even in the first snippet, obviously the integer i has to be converted to a string somewhere -- this is taken care of by the addition operator so it happens where you don't see it, but it still happens.

The reason the second snippet is faster is because it does not have to create a new string by concatenating the result of i.ToString() and ",".

Here's what the first version does:

sb.Append ( i+",");

Call i.ToString.
Create a new string (think new string(iAsString + ",")).
Call sb.Append.

Here's what the second version does:

Call i.ToString.
Call sb.Append.
Call sb.Append.

As you can see the only difference is the second step, where calling sb.Append in the second version is expected to be faster than concatenating two strings and creating another instance from the result.

Upvotes: 4

C#: Why does .ToString() append text faster to an int converted to string?

Answers (3)

Related Questions