Reputation: 81
I tried doing the same thing using Linq and non-Linq methods and found out that Linq is significantly slower (~3000x).
Why is that?
Linq way:
for (int i = 0; i < totalElements; i += stepSize)
{
var currentBlock = testList
.Skip(i)
.Take(stepSize);
result.Add(currentBlock.Sum());
}
result.ToList();
Non-Linq way:
for (int i = 0; i < totalElements; i += stepSize)
{
var currentBlock = testList.GetRange(i, stepSize);
result2.Add(currentBlock.Sum());
}
result2.ToList();
Results:
Method: Linq, Time taken: 26667 ms, Elements: 1000000, Step Size: 100
Method: GetRange, Time taken: 9 ms, Elements: 1000000, Step Size: 100
Full source code as requested:
static void Main(string[] args)
{
var totalElements = 1000000;
var testList = new List<int>(totalElements);
var rand = new Random();
// Initialize the list to random integers between 1 and 1000
for (int i = 0; i < totalElements; i++)
{
testList.Add(rand.Next(1, 1000));
}
var result = new List<int>();
var stepSize = 100;
var stp = new Stopwatch();
stp.Start();
for (int i = 0; i < totalElements; i += stepSize)
{
var currentBlock = testList
.Skip(i)
.Take(stepSize);
result.Add(currentBlock.Sum());
}
result.ToList();
stp.Stop();
Console.WriteLine($"Method: Linq, Time taken: {stp.ElapsedMilliseconds} ms, Elements: {totalElements}, Step Size: {stepSize}");
stp.Reset();
var result2 = new List<int>();
stp.Start();
for (int i = 0; i < totalElements; i += stepSize)
{
var currentBlock = testList.GetRange(i, stepSize);
result2.Add(currentBlock.Sum());
}
result2.ToList();
stp.Stop();
Console.WriteLine($"Method: GetRange, Time taken: {stp.ElapsedMilliseconds} ms, Elements: {totalElements}, Step Size: {stepSize}");
}
Upvotes: 0
Views: 594
Reputation: 30464
GetRange uses Skip(). It always starts enumerating at the beginning. What you'd like to have is a function that divides you sequence into chunks without iterating over the sequence more than really needed.
This means that if you only want the first Chunk the function should not iterate over more than this Chunk, and if I want the 10th chunk after the 9th Chunk, it should not start iterating at the beginning.
How about this extension function?
public static IEnumerable<IEnumerable<Tsource>> ToChuncks<TSource>(
this IEnumerable<TSource> source, int chunkSize)
{
while (source.Any()) // while there are elements left
{ // still something to chunk
// yield return a chunk
yield return source.Take(chunkSize); // return a chunk of chunkSize
// remove the chunk from the source
source = source.Skip(chunkSize); // skip the returned chunk
}
}
This function repeatedly checks if there is still something left in source sequence. If so, it yield returns a Chunk of data and removed the chunk from your source.
This way your complete source is iterated at utmost twice: once if you iterate over the the elements in a Chunk, and once if you iterate over the Chunks.
Upvotes: 0
Reputation: 32750
The problem is how Skip
works, which is radically different from GetRange
. Skip
always starts at the beginning of the enumeration which means you are doing the following:
Iteration #1: Skip 0
Iteration #2: Skip 1 * step
Iteration #3: Skip 2 * step
Iteration #4: Skip 3 * step
Iteration #5: Skip 4 * step
....
Iteration #1.000: Skip 9.999 * step
If you do the maths for 1.000.000 elements and a step
of 100
you get:
sum = 1 + 2 + 3 + .... + 9.999 = 9.999 * (9.999 + 1) / 2 = 49.995.000
total elements skipped: 49.995.000 * 100 = 4.999.500.000
So, your Linq version has a whopping 4.999.500.000
unnecessary iterations.
A good question here would be: why hasn't Skip
been optimized for cases where source
implements IList<T>
, because, plainly, it would be possible.
Upvotes: 5