Reputation: 73163
This is a follow up of this excellent question C# Sort and OrderBy comparison. I will use the same example:
List<Person> persons = new List<Person>();
persons.Add(new Person("P005", "Janson"));
persons.Add(new Person("P002", "Aravind"));
persons.Add(new Person("P007", "Kazhal"));
The methods in contention are:
persons.Sort((p1, p2) => string.Compare(p1.Name, p2.Name, true));
//and
persons.OrderBy(n => n.Name);
Let me start by saying that I understand there isn't any significant performance difference to worry about. But I would love to know why does OrderBy
perform so much better than Sort
. I'm using the answer posted by @phoog in the original question.
private void button1_Click(object sender, EventArgs e)
{
IEnumerable<Person> people;
BenchMark(persons => persons.Sort((p1, p2) => string.Compare(p1.Name, p2.Name, true)));
BenchMark(persons => people = persons.OrderBy(n => n.Name));
}
private static Random randomSeed = new Random();
public static string RandomString(int size, bool lowerCase)
{
var sb = new StringBuilder(size);
int start = (lowerCase) ? 97 : 65;
for (int i = 0; i < size; i++)
{
sb.Append((char)(26 * randomSeed.NextDouble() + start));
}
return sb.ToString();
}
private static void BenchMark(Action<List<Person>> action)
{
List<Person> persons = new List<Person>();
for (int i = 0; i < 10000; i++)
{
persons.Add(new Person("P" + i.ToString(), RandomString(5, true)));
}
List<Person> unsortedPersons = new List<Person>(persons);
Stopwatch watch = new Stopwatch();
for (int i = 0; i < 100; i++)
{
watch.Start();
action(persons);
watch.Stop();
persons.Clear();
persons.AddRange(unsortedPersons);
}
MessageBox.Show(watch.Elapsed.TotalMilliseconds.ToString());
}
Result:
Sort() => 3500 ~ 5000 ms
OrderBy() => 0.2 ~ 1.5 ms
Though differences were profound even with smaller lists I tested initially, it became more and more glaring once the size of the collection went up. May be I'm missing something key to understanding .NET collections, but my thinking is since Sort
acts on an existing List<T>
, it should have lesser overhead (if every any) in processing when compared to OrderBy
which acts on the same List<T>
(in our case persons
) but have to return another collection IOrderedEnumerable<T>
. But still OrderBy
performs far far better. List<T>
might have certain overhead compared to IEnumerable<T>
type, but Sort
anyway acts on the existing list! Furthermore, I'm little amused to see a Linq
method working faster than existing .NET method.
All the answers in the original question compare Sort
against OrderBy.ToList
which I believe will have some overhead and therefore performs more or less equally.
What could be the implementation differences?
Edit: Ok I learned something new. Here is how I confirmed about deferred execution.
private void button1_Click(object sender, EventArgs e)
{
BenchMark(persons =>
{
persons.Sort((p1, p2) => string.Compare(p1.Name, p2.Name, true));
foreach (var item in persons)
{
break;
}
});
BenchMark(persons =>
{
IEnumerable<Person> people = persons.OrderBy(n => n.Name);
foreach (var item in people)
{
break;
}
});
}
Sort
ran in 4000 - 5000ms while OrderBy
ran just above 5000ms. So indeed my conclusion was wrong. Both of them performed on equal terms once I started to enumerate the collections. I prefer the syntax of OrderBy
anyday :)
Edit 2: I just found that this is exact duplicate of this one. But here is a more interesting question about deferred execution in general though not about ordering altogether.
Upvotes: 20
Views: 7118
Reputation: 8032
OrderBy()
does not create a sorted list.
It creates an IEnumerable object that, when you enumerate it, generates a sorted list. The actual sorting doesn't happen until you enumerate the list.
Upvotes: 2
Reputation: 564413
In this case, OrderBy
is far faster because you're not actually executing it.
Until you enumerate the results, the query is deferred, so it's never actually doing the ordering. Until you actually enumerate through the results, the IOrderedEnumerable<T>
doesn't process the input and do any form of ordering.
Try changing your benchmark to:
BenchMark(persons => people = persons.OrderBy(n => n.Name).Count());
The Count()
call will force the ordering to actually occur (since it needs to enumerate the IOrderedEnumerable<T>
to generate a count), which should even out your timings significantly.
Most LINQ extension methods work this way - until you enumerate them (via Count()
, calling ToList()
, or just using them in a normal foreach
loop, etc), they will have negligible impact, as they don't actually do anything directly other than build the enumerable. The reason the other benchmarks compare to OrderBy(...).ToList()
is that the addition of ToList()
forces the OrderBy
to fully execute and actually order the results.
Upvotes: 38
Reputation: 887415
OrderBy()
, like most LINQ methods, uses deferred execution.
It doesn't actually do anything until you enumerate its results.
To properly measure its performance, you can call .OrderBy(...).Count()
.
Upvotes: 12