Filip Ekberg
Filip Ekberg

Reputation: 36287

Converting IEnumerable<T> to List<T> on a LINQ result, huge performance loss

On a LINQ-result you like this:

var result = from x in Items select x;
List<T> list = result.ToList<T>();

However, the ToList<T> is Really Slow, does it make the list mutable and therefore the conversion is slow?

In most cases I can manage to just have my IEnumerable or as Paralell.DistinctQuery but now I want to bind the items to a DataGridView, so therefore I need to as something else than IEnumerable, suggestions on how I will gain performance on ToList or any replacement?

On 10 million records in the IEnumerable, the .ToList<T> takes about 6 seconds.

Upvotes: 1

Views: 5034

Answers (4)

Mark Byers
Mark Byers

Reputation: 838096

It's because LINQ likes to be lazy and do as little work as possible. This line:

var result = from x in Items select x;

despite your choice of name, isn't actually a result, it's just a query object. It doesn't fetch any data.

List<T> list = result.ToList<T>();

Now you've actually requested the result, hence it must fetch the data from the source and make a copy of it. ToList guarantees that a copy is made.

With that in mind, it's hardly surprising that the second line is much slower than the first.

Upvotes: 7

Mehrdad Afshari
Mehrdad Afshari

Reputation: 421978

.ToList() is slow in comparison to what?

If you are comparing

var result = from x in Items select x;
List<T> list = result.ToList<T>();

to

var result = from x in Items select x;

you should note that since the query is evaluated lazily, the first line doesn't do much at all. It doesn't retrieve any records. Deferred execution makes this comparison completely unfair.

Upvotes: 10

Guffa
Guffa

Reputation: 700242

No, it's not creating the list that takes time, it's fetching the data that takes time.

Your first code line doesn't actually fetch the data, it only sets up an IEnumerable that is capable of fetching the data. It's when you call the ToList method that it will actually get all the data, and that is why all the execution time is in the second line.

You should also consider if having ten million lines in a grid is useful at all. No user is ever going to look through all the lines, so there isn't really any point in getting them all. Perhaps you should offer a way to filter the result before getting any data at all.

Upvotes: 2

Anton Gogolev
Anton Gogolev

Reputation: 115721

I think it's because of memory reallocations: ToList cannot know the size of the collection beforehand, so that it could allocate enough storage to keep all items. Therefore, it has to reallocate the List<T> as it grows.

If you can estimate the size of your resultset, it'll be much faster to preallocate enough elements using List<T>(int) constructor overload, and then manually add items to it.

Upvotes: 0

Related Questions