Martin Lottering
Martin Lottering

Reputation: 1792

Is T[] not better than IEnumerable<T> as parameter type? (Considering threading challenges)

Some programmers argue that it is better to pass IEnumerable<T> parameters over passing implementations like List<T>, and one of the popular reasons to do this is that the API is immediately usable in more scenarios because more collections will be compatible with IEnumerable<T> than any other specific implementation, e.g. List<T>.

The more I dive into multi-threaded development scenarios, the more I am starting to think that IEnumerable<T> is not the correct type either and I will try to explain why below.

Have you ever received the following exception while enumerating a collection?

Collection was modified; enumeration operation may not execute. (an InvalidOperationException)

Collection was modified exception

Basically what causes this is, you are given the collection on one thread, while someone else modifies the collection on another thread.

To circumvent the problem, I then developed a habit to take a snapshot of the collection before I enumerate it, by converting the collection to an array inside the method, like this:

static void LookAtCollection(IEnumerable<int> collection)
{   
    foreach (int Value in collection.ToArray() /* take a snapshot by converting to an array */)
    {
        Thread.Sleep(ITEM_DELAY_MS);
    }
}

My question is this. Wouldn't it be better to code towards arrays instead of enumerables as a general rule, even if it means that callers are now forced to convert their collections to arrays when using your API?

Is this not cleaner and more bullet-proof? (the parameter type is an array instead)

static void LookAtCollection(int[] collection)
{   
    foreach (int Value in collection)
    {
        Thread.Sleep(ITEM_DELAY_MS);
    }
}

The array meets all the requirements of being enumerable and of a fixed length, and the caller is aware of the fact that you will be operating on a snapshot of the collection at that point in time, which can save more bugs.

The only better alternative I can find right now is the IReadOnlyCollection which will then be even more bullet proof because the collection is then also readonly in terms of item-content.

EDIT:

@DanielPryden provided a link to a very nice article "Arrays considered somewhat harmful". And the comments made by the writer "I rarely need a collection which has the rather contradictory properties of being completely mutable, and at the same time, fixed in size" and "In almost every case, there is a better tool to use than an array." kind of convinced me that arrays are not as close to the silver bullet as I had hoped for, and I agree with the risks and loopholes. I think now the IReadOnlyCollection<T> interface is a better alternative than both the array and the IEnumerable<T>, but it kind of leaves me with the question now: Does the callee enforce it by having a IReadOnlyCollection<T> parameter type in the method declaration? Or should the caller still be allowed to decide what implementation of IEnumerable<T> he passes into the method that looks at the collection?

Thanks for all the answers. I learned a lot from these responses.

Upvotes: 9

Views: 602

Answers (5)

supercat
supercat

Reputation: 81247

The real "problem" here is that while T[] is probably a more specific parameter type than would be ideal, and allows the recipient free access to write any element (which may or may not be desirable), IEnumerable<T> is too general. From a type-safety standpoint, the caller can supply a reference to any object which implements IEnumerable<T>, but in reality only certain such objects will actually work and there's no clean way for the caller to know which ones those are.

If T is a reference type, a T[] will be inherently thread-safe for reading, writing, and enumeration, subject to certain conditions (e.g. threads accessing different elements will not interfere at all; if one thread writes an item around the time another thread reads or enumerates it, the latter thread will see either old or new data. None of Microsoft's collection interfaces offer any such guarantees, nor do they provide any means by which a collection can indicate what guarantees it can or cannot make.

My inclination would be to either use T[], or else define an IThreadSafeList<T> where T:class which would include members like CompareExchangeItem that would allow Interlocked.CompareExchange on an item, but would not include things like Insert and Remove which cannot be done in thread-safe fashion.

Upvotes: 4

to StackOverflow
to StackOverflow

Reputation: 124776

Most methods are not thread-safe unless explicitly documented otherwise.

In this case I'd say it's up to the caller to take care of thread-safety. If the caller passes an IEnumerable<T> parameter to a method, he should hardly be surprised if the method enumerates it! If the enumeration is not thread-safe, the caller needs to create a snapshot in a thread-safe manner, and pass in the snapshot.

The callee can't do this by calling ToArray() or ToList() - these methods will also enumerate the collection, and the callee can't know what is required (e.g. a lock) to make a snapshot in a thread-safe way.

Upvotes: 4

Ryszard Dżegan
Ryszard Dżegan

Reputation: 25434

One of the idea behind IEnumerable is that it is a result of a LINQ query. Since the query is rather than declaration instead of executed code, the result of IEnumerable is only invoked while iterating in foreach or copying items to an Array or other structure. Now if we would like to store the result in a static structure like an Array, we will have to update it prior to iteration as well as IEnumerable or we will receive out-of-date data. IEnumerable.ToArray means: Update the collection and make the static copy of it. So my answer is, that if we have to do update in advance of operating on the collection, the IEnumerable is more concise. In other scenarios we can use Array in order to simplify code.

Upvotes: 0

Tschallacka
Tschallacka

Reputation: 28742

Threads should never be allowed to access the same objects as other threads. You should always duplicate the variables/objects/arrays and parse them at the threads leisure.

When it's finished it should call back to the main thread so that thread can either update the current values/merge the values.

If two objects can reach the same object at the same time you get or a tug of war, or exceptions or strange behaviour.

Imagine threads like this: You have the boss and two employees. The boss makes copies of the letter he has in his hand and gives it to the employees. The employees each has his own tasks, one to collect everything on the list, the other to check if the items on the list need to be ordered from suppliers. They each go in their own private rooms and the boss just puts the paper on the desk where he sometimes checks it to tell someone how much there at this moment is.

employee one returns with the ordered objects and gives them to the director who passes it on to the logistics department. An hour later employee two returns with an updated list with stock quantities. The director throws out the old list and replaces it with the new list.

Now the scenario where there is one list.

The boss gives the same instructions. Employee one takes the list and starts calling the first supplier. Employee 2 takes the list and starts picking items. The director wants to look at the list but the list is gone and the entire sales funnel comes to a grinding halt because there is no accurate information because the list is occupied. The first employee has finished calling and wants to update his list. He can't because the list is occupied and he can't continue with the next item because the list is gone so he's stuck twiddling his thumbs or throwing a fit. When employee two is ready with his list employee takes the list and tries to finish calling order statusses but the director storms in and takes the list and passes on the information to the sales team. The director and the employee both start to fight over the list resulting in that no work is done and the company dies.

That is why you always need to work with independant copies in threads unless you know only that thread exclusively needs access to that resource, because you never know when that specific thread is going to do something with that resource because you don't control CPU timing.

I hope this helps a bit.

Upvotes: 1

Grant Thomas
Grant Thomas

Reputation: 45068

This problem isn't specific to threading. Try modifying a collection within a loop that explicitly or implicitly calls GetEnumerator. The result is the same. It's an illegal operation.

Upvotes: 2

Related Questions