dipen
dipen

Reputation: 81

Why is Linq significantly slower?

I tried doing the same thing using Linq and non-Linq methods and found out that Linq is significantly slower (~3000x).

Why is that?

Linq way:

for (int i = 0; i < totalElements; i += stepSize)
{
    var currentBlock = testList
        .Skip(i)
        .Take(stepSize);

    result.Add(currentBlock.Sum());
}

result.ToList();

Non-Linq way:

for (int i = 0; i < totalElements; i += stepSize)
{
    var currentBlock = testList.GetRange(i, stepSize);

    result2.Add(currentBlock.Sum());
}

result2.ToList();

Results:

Full source code as requested:

static void Main(string[] args)
{
    var totalElements = 1000000;
    var testList = new List<int>(totalElements);
    var rand = new Random();

    // Initialize the list to random integers between 1 and 1000
    for (int i = 0; i < totalElements; i++)
    {
        testList.Add(rand.Next(1, 1000));
    }

    var result = new List<int>();
    var stepSize = 100;
    var stp = new Stopwatch();

    stp.Start();
    for (int i = 0; i < totalElements; i += stepSize)
    {
        var currentBlock = testList
            .Skip(i)
            .Take(stepSize);

        result.Add(currentBlock.Sum());
    }

    result.ToList();
    stp.Stop();

    Console.WriteLine($"Method: Linq, Time taken: {stp.ElapsedMilliseconds} ms, Elements: {totalElements}, Step Size: {stepSize}");

    stp.Reset();

    var result2 = new List<int>();
    stp.Start();

    for (int i = 0; i < totalElements; i += stepSize)
    {
        var currentBlock = testList.GetRange(i, stepSize);

        result2.Add(currentBlock.Sum());
    }

    result2.ToList();
    stp.Stop();

    Console.WriteLine($"Method: GetRange, Time taken: {stp.ElapsedMilliseconds} ms, Elements: {totalElements}, Step Size: {stepSize}");
}

Upvotes: 0

Views: 594

Answers (2)

Harald Coppoolse
Harald Coppoolse

Reputation: 30464

GetRange uses Skip(). It always starts enumerating at the beginning. What you'd like to have is a function that divides you sequence into chunks without iterating over the sequence more than really needed.

This means that if you only want the first Chunk the function should not iterate over more than this Chunk, and if I want the 10th chunk after the 9th Chunk, it should not start iterating at the beginning.

How about this extension function?

public static IEnumerable<IEnumerable<Tsource>> ToChuncks<TSource>(
    this IEnumerable<TSource> source, int chunkSize)
{
    while (source.Any())                 // while there are elements left
    {   // still something to chunk
        // yield return a chunk
        yield return source.Take(chunkSize); // return a chunk of chunkSize

        // remove the chunk from the source
        source = source.Skip(chunkSize);     // skip the returned chunk
    }
}

This function repeatedly checks if there is still something left in source sequence. If so, it yield returns a Chunk of data and removed the chunk from your source.

This way your complete source is iterated at utmost twice: once if you iterate over the the elements in a Chunk, and once if you iterate over the Chunks.

Upvotes: 0

InBetween
InBetween

Reputation: 32750

The problem is how Skip works, which is radically different from GetRange. Skip always starts at the beginning of the enumeration which means you are doing the following:

Iteration #1: Skip 0
Iteration #2: Skip 1 * step
Iteration #3: Skip 2 * step
Iteration #4: Skip 3 * step
Iteration #5: Skip 4 * step
....
Iteration #1.000: Skip 9.999 * step

If you do the maths for 1.000.000 elements and a step of 100 you get:

sum = 1 + 2 + 3 + .... + 9.999 = 9.999 * (9.999 + 1) / 2 = 49.995.000
total elements skipped: 49.995.000 * 100 = 4.999.500.000

So, your Linq version has a whopping 4.999.500.000 unnecessary iterations.

A good question here would be: why hasn't Skip been optimized for cases where source implements IList<T>, because, plainly, it would be possible.

Upvotes: 5

Related Questions