Michael Graczyk
Michael Graczyk

Reputation: 4965

Mistake in List<T>.InsertRange(), or design decision?

I was reading the code for List<T>.InsertRange() in the .NET 4.0 framework, when I noticed a strange peculiarity. Here is the code for reference:

    public void InsertRange(int index, IEnumerable<T> collection) {
        if (collection==null) { 
            ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
        } 

        if ((uint)index > (uint)_size) {
            ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.index, ExceptionResource.ArgumentOutOfRange_Index); 
        }
        Contract.EndContractBlock();

        ICollection<T> c = collection as ICollection<T>; 
        if( c != null ) {    // if collection is ICollection<T>
            int count = c.Count; 
            if (count > 0) { 
                EnsureCapacity(_size + count);
                if (index < _size) { 
                    Array.Copy(_items, index, _items, index + count, _size - index);
                }

                // If we're inserting a List into itself, we want to be able to deal with that. 
                if (this == c) {
                    // Copy first part of _items to insert location 
                    Array.Copy(_items, 0, _items, index, index); 
                    // Copy last part of _items back to inserted location
                    Array.Copy(_items, index+count, _items, index*2, _size-index); 
                }
                else {
                    T[] itemsToInsert = new T[count];
                    c.CopyTo(itemsToInsert, 0); 
                    itemsToInsert.CopyTo(_items, index);
                } 
                _size += count; 
            }
        } 
        else {
            using(IEnumerator<T> en = collection.GetEnumerator()) {
                while(en.MoveNext()) {
                    Insert(index++, en.Current); 
                }
            } 
        } 
        _version++;
    }

In particular, notice that _version is always incremented at the end of the function. This means that in-progress enumerations over the list will be invalidated at any non-exceptional call to InsertRange, even if the List was not changed. For instance, the following code throws:

static void Main(string [] args) {
    var list = new List<object>() {1, 2 };


    using(var enumerator = list.GetEnumerator()) {
    if(enumerator.MoveNext())
        Console.WriteLine(enumerator.Current);

    list.InsertRange(1, new object[]{});


    if(enumerator.MoveNext()) // ** InvalidOperationException
        Console.WriteLine(enumerator.Current);
    }
}

Modifying the method so that enumeration is not invalidated in this way would not increase execution time at all because the code already checks the size of count. It could be rewritten as follows:

public void InsertRange(int index, IEnumerable<T> collection) {
    if (collection==null) { 
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
    } 

    if ((uint)index > (uint)_size) {
        ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.index, ExceptionResource.ArgumentOutOfRange_Index); 
    }
    Contract.EndContractBlock();

    ICollection<T> c = collection as ICollection<T>; 
    if( c != null ) {    // if collection is ICollection<T>
        int count = c.Count; 
        if (count > 0) { 
            EnsureCapacity(_size + count);
            if (index < _size) { 
                Array.Copy(_items, index, _items, index + count, _size - index);
            }

            // If we're inserting a List into itself, we want to be able to deal with that. 
            if (this == c) {
                // Copy first part of _items to insert location 
                Array.Copy(_items, 0, _items, index, index); 
                // Copy last part of _items back to inserted location
                Array.Copy(_items, index+count, _items, index*2, _size-index); 
            }
            else {
                T[] itemsToInsert = new T[count];
                c.CopyTo(itemsToInsert, 0); 
                itemsToInsert.CopyTo(_items, index);
            } 
            _size += count;
            _version++;
        }
    } 
    else {
        var inserted = false;

        using(IEnumerator<T> en = collection.GetEnumerator()) {
            while(en.MoveNext()) {
                inserted = true;
                Insert(index++, en.Current); 
            }  
        }

        if (inserted) _version++; 
    } 
}

Which has as its only disadvantage an extra local variable (which will probably be JITed into a register), maybe 20 bytes increase in the working set, and an irrelevant amount of extra CPU work when inserting IEnumerables. If the extra bool or the in-loop assignment need to be avoided, the insertion for IEnumerables could be performed as

if(en.MoveNext()) {
    Insert(index++, en.Current); 
    _version++;
}

while(en.MoveNext()) {
    Insert(index++, en.Current); 
}  

So...

Is the .NET implementation the intended behavior, or is it a mistake?

EDIT:

I realize that if you are enumeration on one thread while modifying the thread on another, you are doing something wrong. According to the documentation, the behavior in these cases is undefined. However, List<T> does the programmer a favor and throws an exception in these cases. I am not asking if List<T> follows the documentation correctly: it does. I am asking if it is implemented in a way that is not what was intended by Microsoft.

If InsertRange() is behaving as intended, then List<T> has inconsistent behavior. The RemoveRange() method only invalidates enumeration if items were actually removed:

static void Main(string [] args) {
    var list = new List<object>() {1, 2 };

    using(var enumerator = list.GetEnumerator()) {
    if(enumerator.MoveNext())
        Console.WriteLine(enumerator.Current);

    list.RemoveRange(1, 0);


    if(enumerator.MoveNext()) // ** Does not throw
        Console.WriteLine(enumerator.Current);
    }
}

Upvotes: 3

Views: 1328

Answers (3)

Paul Phillips
Paul Phillips

Reputation: 6259

I would guess that this is intentional. C# follows a "pit of success" design - they want it to be hard to make a mistake.

Ultimately, the existing design makes it easier to analyze the use of that method. What do I mean by that? Well:

The example you cite is trivial, at a glance you can tell it isn't really modifying the list. But in almost all real-world code that isn't the case. The sequence being inserted would almost certainly be dynamically created, and could be an empty sequence almost randomly. Should an empty sequence really behave differently? Your code would work if the sequence inserted was empty, but the moment you put something real in there, ka-boom.

Imagine if you first write this code and all your sequences are empty; looks like it works. Then, some arbitrary time later, you have a non-empty insert. Now you get exceptions. The distance between you inserting the problem and detecting it is potentially very large.

With it throwing on any successful call, that failure mode becomes easier to detect.

Upvotes: 3

Erik Funkenbusch
Erik Funkenbusch

Reputation: 93424

The idea here is that you should not write code that reads from the list while modifying it, even if your call doesn't actually modify the list, a different call might. It's safest to simply invalidate the list every time, even if once in a while it might be less efficient.

Upvotes: 0

Wormbo
Wormbo

Reputation: 4992

It's probably intended. The idea when calling a function to insert items is to modify the list. The case where the list ends up not being modified is an exception, but the intention was to modify it. I guess here the intention counts. If someone calls InsertRange while simultaneously iterating over the list, then there's a conceptual problem already.

Personally I'm more a fan of the Java collection framework here - the ListIterator class can modify the list through its own Add, Set and Remove methods. Even the general Iterator can remove items from the iterated collection. But as far as I know, Java also considers any modification attempts other than those from the iterator itself as operations that invalidate the (list)iterator.

Upvotes: 1

Related Questions