axe
axe

Reputation: 2371

LINQ query is slow

During application profiling I found that function checking for pattern match is very slow. It is written using LINQ. Simple replacement of this LINQ expression with loop makes really huge difference. What is it? Is LINQ really such a bad thing and works so slow or I misunderstand something?

private static bool PatternMatch1(byte[] buffer, int position, string pattern)
{
    int i = 0;

    foreach (char c in pattern)
    {
        if (buffer[position + i++] != c)
        {
            return false;
        }
    }

    return true;
}    

version 2 with LINQ (suggested by Resharper)

private static bool PatternMatch2(byte[] buffer, int position, string pattern)
{
    int i = 0;
    return pattern.All(c => buffer[position + i++] == c);
}

version 3 with LINQ

private static bool PatternMatch3(byte[] buffer, int position, string pattern)
{
    return !pattern.Where((t, i) => buffer[position + i] != t).Any();
}

version 4 using lambda

private static bool PatternMatch4(byte[] buffer, int position, string pattern, Func<char, byte, bool> predicate)
{
    int i = 0;

    foreach (char c in pattern)
    {
        if (predicate(c, buffer[position + i++]))
        {
            return false;
        }
    }

    return true;
}

and here is the usage with large buffer

const int SIZE = 1024 * 1024 * 50;

byte[] buffer = new byte[SIZE];

for (int i = 0; i < SIZE - 3; ++i)
{
    if (PatternMatch1(buffer, i, "xxx"))
    {
        Console.WriteLine(i);
    }
}

Calling PatternMatch2 or PatternMatch3 is phenomenally slow. It takes about 8 seconds for PatternMatch3 and about 4 seconds for PatternMatch2, while calling PatternMatch1 takes about 0.6. As far as I understand it is the same code and I see no difference. Any ideas?

Upvotes: 8

Views: 4558

Answers (4)

Marco Mp
Marco Mp

Reputation: 422

Well let's take the Where operator.

It's implementation is almost(*) like:

public IEnumerable<T> Where(this IEnumerable<T> input, Func<T, bool> fn)
{
    foreach(T i in input) 
        if (fn(i))
            yield return i;
}

This means, for each loop over the IEnumerable an iterator object is created - note that you have SIZE - n of these allocations because you do those many LINQ queries.

Then for each character in the pattern you have:

  1. A function call to the enumerator
  2. A function call to the predicate

The second is a call to a delegate which costs roughly double the calling cost of a typical virtual call (where in the first version you have no extra calls apart the array deindexing.

If you really want brute performance, you probably want to get as "older-style" code as possible. In this case I would even replace that foreach in method 1 with a plain for (at the very least if it does not serve to optimize, it makes it more readable since you are keeping track of the index anyway).

It's also more readable in this case, and it shows that Resharper suggestions are sometimes debatable.

(*) I said almost because it uses a proxy method to check that the input enumerable is not null and throw the exception earlier than the time the collection is enumerated -- it's a small detail which does not invalidate what I wrote before, highlighting here for completeness.

Upvotes: 6

Julien Lebosquain
Julien Lebosquain

Reputation: 41213

Mark Byers and Marco Mp are right about the extra calls overhead. However, there is another reason here: many object allocations, due to a closure.

The compiler creates a new object on each iteration, storing the current values of buffer, position and i that the predicate can use. Here is a dotTrace snapshot of PatternMatch2 showing this. <>c_DisplayClass1 is the compiler-generated class.

(Note that the numbers are huge because I'm using tracing profiling, which adds overhead. However, it's the same overhead per call, so the overall percentages are correct).

enter image description here

Upvotes: 7

Mark Byers
Mark Byers

Reputation: 837996

The difference is that the LINQ version has extra function calls. Internally in LINQ your lambda function is called in a loop.

While the extra call might be optimized away by the JIT compiler, it's not guaranteed and it can add a small overhead. In most cases the extra overhead won't matter, but because your function is very simple and is called an extremely large number of times, even a small overhead can quickly add up. In my experience LINQ code generally is a little bit slower than equivalent for loops. That's the performance price you often pay for a more high level syntax.

In this situation, I would recommend sticking with the explicit loop.

While you're optimizing this section of code, you might also want to consider a more efficient searching algorithm. Your algorithm is worst case O(n*m) but better algorithms exist.

Upvotes: 6

Wormbo
Wormbo

Reputation: 4992

LINQ's main goal when applied to collections is its simplicity. If you want performance, you should avoid LINQ entirely. Also to enumerate an array you just increment an index variable, while LINQ needs to set up an entire enumerator object.

Upvotes: 1

Related Questions