Reputation: 33
While writing a solution for a coding problem I discovered an interesting behavior of my LINQ statements. I had two scenarios:
First:
arr.Select(x => x + 5).OrderBy(x => x)
Second:
arr.OrderBy(x => x).Select(x => x + 5)
After a little bit of testing with System.Diagnostics.Stopwatch I got the following results for an integer array of length 100_000.
For the first approach:
00:00:00.0000152
For the second:
00:00:00.0073650
Now I'm interested in why it takes more time if I do the ordering first. I wasn't able to find something on google so I thought about it by myself.
I ended up with 2 Ideas:
1. The second scenario has to convert to IOrderedEnumerable and then back to IEnumerable while the first scenario only has to convert to IOrderedEnumerable and not back.
2. You end up having 2 loops. The first for sorting and the second for the selecting while approach 1 does everything in 1 loop.
So my question is why does it take much more time to do the ordering before select?
Upvotes: 3
Views: 921
Reputation: 186678
Let's have a look at the sequences:
private static void UnderTestOrderBySelect(int[] arr) {
var query = arr.OrderBy(x => x).Select(x => x + 5);
foreach (var item in query)
;
}
private static void UnderTestSelectOrderBy(int[] arr) {
var query = arr.Select(x => x + 5).OrderBy(x => x);
foreach (var item in query)
;
}
// See Marc Gravell's comment; let's compare Linq and inplace Array.Sort
private static void UnderTestInPlaceSort(int[] arr) {
var tmp = arr;
var x = new int[tmp.Length];
for (int i = 0; i < tmp.Length; i++)
x[i] = tmp[i] + 5;
Array.Sort(x);
}
In order to perform benchmark, let's run 10
times and average 6
middle results:
private static string Benchmark(Action<int[]> methodUnderTest) {
List<long> results = new List<long>();
int n = 10;
for (int i = 0; i < n; ++i) {
Random random = new Random(1);
int[] arr = Enumerable
.Range(0, 10000000)
.Select(x => random.Next(1000000000))
.ToArray();
Stopwatch sw = new Stopwatch();
sw.Start();
methodUnderTest(arr);
sw.Stop();
results.Add(sw.ElapsedMilliseconds);
}
var valid = results
.OrderBy(x => x)
.Skip(2) // get rid of top 2 runs
.Take(results.Count - 4) // get rid of bottom 2 runs
.ToArray();
return $"{string.Join(", ", valid)} average : {(long) (valid.Average() + 0.5)}";
}
Time to run and have a look at the results:
string report = string.Join(Environment.NewLine,
$"OrderBy + Select: {Benchmark(UnderTestOrderBySelect)}",
$"Select + OrderBy: {Benchmark(UnderSelectOrderBy)}",
$"Inplace Sort: {Benchmark(UnderTestInPlaceSort)}");
Console.WriteLine(report);
Outcome: (Core i7 3.8GHz, .Net 4.8 IA64)
OrderBy + Select: 4869, 4870, 4872, 4874, 4878, 4895 average : 4876
Select + OrderBy: 4763, 4763, 4793, 4802, 4827, 4849 average : 4800
Inplace Sort: 888, 889, 890, 893, 896, 904 average : 893
I don't see any significant difference, Select + OrderBy
seems to be slightly more efficient (about 2% gain) than OrderBy + Select
. Inplace Sort, however, has far better performance (5
times) than any of Linq.
Upvotes: 3
Reputation: 37000
Depending on which Linq-provider you have, there may happen some optimization on the query. E.g. if you´d use some kind of database, chances are high your provider would create the exact same query for both statements similar to this one:
select myColumn from myTable order by myColumn;
Thus performamce should be identical, no matter if you order first in Linq or select first.
As this does not seem to happen here, you probably use Linq2Objects, which has no optimization at all. So the order of your statements may have an efffect, in particular if you´d have some kind of filter using Where
which would filter many objects out so that later statements won´t operate on the entire collection.
To keep long things short: the difference most probably comes from some internal initialzation-logic. As a dataset of 100000 numbers is not really big - at least not big enough - even some fast initialization has a big impact.
Upvotes: 2