Reputation: 405
struct mydata
{
public int id;
public string data;
}
class Program
{
static void Main(string[] args)
{
List<mydata> myc = new List<mydata>();
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < 1000000; i++)
{
mydata d = new mydata();
d.id = i;
d.data = string.Format("DataValue {0}",i);
myc.Add(d);
}
stopwatch.Stop();
Console.WriteLine("End: {0}", stopwatch.ElapsedMilliseconds);
}
Whys is this code above so SLOW..?
On an older laptop the times are:
C# code above: 1500ms
Similar code in Delphi: 450ms....
I then changed the code to a KeyValue/Pair (see below):
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
var list = new List<KeyValuePair<int , string>>();
for (int i = 0; i < 1000000; i++)
{
list.Add(new KeyValuePair<int,string>(i, "DataValue" + i));
}
stopwatch.Stop();
Console.WriteLine("End: {0}", stopwatch.ElapsedMilliseconds);
Console.ReadLine();
This improved the time to 1150ms..
If I remove the '+ i' the time is < 300ms
If I try and replace it with a StringBuilder, the timing is similar.
StringBuilder sb = new StringBuilder();
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
var list = new List<KeyValuePair<int, string>>();
for (int i = 0; i < 1000000; i++)
{
sb.Append("DataValue");
sb.Append(i);
list.Add(new KeyValuePair<int, string>(i, sb.ToString()));
sb.Clear();
}
stopWatch.Stop();
Console.WriteLine("End: {0}", stopWatch.ElapsedMilliseconds);
Console.ReadLine();
Is slightly better.. If you remove the sb.Append(i) its very fast..
It would appear that any time you have to add an Int to a string/stringbuilder its VERY SLOW..
Can I speed this up in any way ??
EDIT **
The code below is the quickest I can get after making suggestions:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Threading;
namespace ConsoleApplication1
{
struct mydata
{
public int id;
public string data;
}
class Program
{
static void Main(string[] args)
{
List<mydata> myc = new List<mydata>();
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < 1000000; i++)
{
mydata d = new mydata();
d.id = i;
d.data = "DataValue " + i.ToString();
myc.Add(d);
}
stopwatch.Stop();
Console.WriteLine("End: {0}", stopwatch.ElapsedMilliseconds);
Console.ReadLine();
}
}
}
If I replace the line:
d.data = "DataValue " + i.ToString();
with:
d.data = "DataValue ";
On my home machine this goes from 660ms -> 31ms..
Yes.. its 630ms slower with the '+ i.ToString()'
But still 2x faster than boxing/string.format etc etc..
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
var list = new List<KeyValuePair<int, string>>();
for (int i = 0; i < 1000000; i++)
{
list.Add(new KeyValuePair<int, string>(i, "DataValue" +i.ToString()));
}
stopwatch.Stop();
Console.WriteLine("End: {0}", stopwatch.ElapsedMilliseconds);
Console.ReadLine();
is 612ms.. (no difference in speed if List>(1000000); is pre-initialised).
Upvotes: 3
Views: 1426
Reputation: 837926
The problem with your first two examples is that the integer must first be boxed and then converted to a string. The boxing causes the code to be slower.
For example, in this line:
d.data = string.Format("DataValue {0}", i);
the second parameter to string.Format
is object
, which causes boxing of i
. See the intermediate language code for confirmation of this:
... box int32 call string [mscorlib]System.String::Format(string, object) ...
Similarly this code:
d.data = "DataValue " + i;
is equivalent to this:
d.data = String.Concat("DataValue ", i);
This uses the overload of String.Concat
with parameters of type object
so again this involves a boxing operation. This can be seen in the generated intermediate language code:
... box int32 call string [mscorlib]System.String::Concat(object, object) ...
For better performance this approach avoids the boxing:
d.data = "DataValue " + i.ToString();
Now the intermediate language code doesn't include the box
instruction and it uses the overload of String.Concat
that takes two strings:
... call instance string [mscorlib]System.Int32::ToString() call string [mscorlib]System.String::Concat(string, string) ...
Upvotes: 10
Reputation: 10516
Another major performance killer in this code is the List. Internally it stores the items in an array. When you call Add, it checks if the new Item can fit into the array (EnsureCapacitiy). When it needs more room, it will create a NEW array with double the size, and then COPY the items from the old array into the new one. You can see all this going on if you check out List.Add in Reflector.
So in your code with 1,000,000 items, it needs to copy the array some 25 times, and they're bigger each time.
If your change your code to
var list = new List<KeyValuePair<int, string>>(1000000);
you should see a dramatic increase in speed. Let us know how you fare!
Regards,
GJ
Upvotes: 0
Reputation: 4909
There are a lot of things wrong with the above which will be affecting your results. First, none of the comparisons you've done are equal. In both you have a list, and use Add, what you add to the list won't affect the time, changing the declaration of the List to var won't affect the time.
I'm not convinced by the boxing argument put up by Mark, this can be a problem, but I'm pretty certain in the first case there is an implicit call to .ToString. This has its own overhead, and would be needed even if the int is boxed.
Format is quite an expensive operation. The second version has a string concatenation which is probably cheaper than a .Format.
The third is just expensive all the way. Using a string builder like that is not efficient. Internally a stringbuilder is just a list. When you do a .ToString on it you essentially do a big concat operation then.
The reason some of the operations might suddenly run really quickly if you take out a critical line is that the compile can optimise out bits of code. If it seems to be doing the same thing over and over it might not do it (Gross over simplification).
Right, so here's my suggestion:
The first version is probably the nearest to being "right" in my mind. What you could do is defer some of the processing. Take the object mydata and set a string property AND an int property. Then only when you need to do the read of the string produce the output via a concat. Save that if you're going to repeat the print operation a lot. It won't necessarilly be quicker in the way you expect.
Upvotes: 0
Reputation: 57936
On my machine:
... String.Format("DataValue {0}", i ) // ~1650ms
... String.Format("DataValue {0}", "") // ~1250ms
... new MyData {Id = i, Data = "DataValue {0}" + i} // ~1200ms
As Mark said, there's a boxing operation involved.
For this specific case, when you get your DataValue based on your id, you could to create a get property or to override ToString() method to do that operation just when you need it.
public override string ToString()
{
return "DataValue {0}" + Id;
}
Upvotes: 1