Dan
Dan

Reputation: 1190

Speed improvement in LINQ Where(Array.Contains)

I initially had a method that contained a LINQ query returning int[], which then got used later in a fashion similar to:

int[] result = something.Where(s => previousarray.Contains(s.field));

This turned out to be horribly slow, until the first array was retrieved as the native IQueryable<int>. It now runs very quickly, but I'm wondering how I'd deal with the situation if I was provided an int[] from elsewhere which then had to be used as above.

Is there a way to speed up the query in such cases? Converting to a List doesn't seem to help.

Upvotes: 4

Views: 5216

Answers (2)

drch
drch

Reputation: 3070

In LINQ-SQL, a Contains will be converted to a SELECT ... WHERE field IN(...) and should be relatively fast. In LINQ-Objects however, it will call ICollection<T>.Contains if the source is an ICollection<T>.

When a LINQ-SQL result is treated as an IEnumerable instead of an IQueryable, you lose the linq provider - i.e., any further operations will be done in memory and not in the database.

As for why its much slower in memory:

Array.Contains() is an O(n) operation so

something.Where(s => previousarray.Contains(s.field));

is O(p * s) where p is the size of previousarray and s is the size of something.

HashSet<T>.Contains() on the other hand is an O(1) operation. If you first create a hashset, you will see a big improvement on the .Contains operation as it will be O(s) instead of O(p * s).

Example:

var previousSet = new HashSet<int>(previousarray);
var result = something.Where(s => previousSet.Contains(s.field));

Upvotes: 20

gleb.kudr
gleb.kudr

Reputation: 1508

Where on Lists/Arrays/IEnumarables etc is O[N] operation. It is O[~1] on HashSet. So you should try to use it.

Upvotes: 1

Related Questions