Jonas Elfström
Jonas Elfström

Reputation: 31428

List<T> and IEnumerable difference

While implementing this generic merge sort, as a kind of Code Kata, I stumbled on a difference between IEnumerable and List that I need help to figure out.

Here's the MergeSort

public class MergeSort<T>
{
    public IEnumerable<T> Sort(IEnumerable<T> arr)
    {
        if (arr.Count() <= 1) return arr;

        int middle = arr.Count() / 2;
        var left = arr.Take(middle).ToList();
        var right = arr.Skip(middle).ToList();
        return Merge(Sort(left), Sort(right));
    }

    private static IEnumerable<T> Merge(IEnumerable<T> left, IEnumerable<T> right)
    {
        var arrSorted = new List<T>();

        while (left.Count() > 0 && right.Count() > 0)
        {
            if (Comparer<T>.Default.Compare(left.First(), right.First()) < 0)
            {
                arrSorted.Add(left.First());
                left=left.Skip(1);
            }
            else
            {
                arrSorted.Add(right.First());  
                right=right.Skip(1);  
            }
        }

        return arrSorted.Concat(left).Concat(right);
    }
}

If I remove the .ToList() on the left and right variables it fails to sort correctly. Do you see why?

Example

var ints = new List<int> { 5, 8, 2, 1, 7 };
var mergeSortInt = new MergeSort<int>();
var sortedInts = mergeSortInt.Sort(ints);

With .ToList()

    [0]: 1
    [1]: 2
    [2]: 5
    [3]: 7
    [4]: 8

Without .ToList()

    [0]: 1
    [1]: 2
    [2]: 5
    [3]: 7
    [4]: 2

Edit

It was my stupid test that got me.

I tested it like this:

var sortedInts = mergeSortInt.Sort(ints);
ints.Sort();
if (Enumerable.SequenceEqual(ints, sortedInts)) Console.WriteLine("ints sorts ok");

just changing the first row to

var sortedInts = mergeSortInt.Sort(ints).ToList();

removes the problem (and the lazy evaluation).

EDIT 2010-12-29

I thought I would figure out just how the lazy evaluation messes things up here but I just don't get it.

Remove the .ToList() in the Sort method above like this

var left = arr.Take(middle);
var right = arr.Skip(middle);

then try this

var ints = new List<int> { 5, 8, 2 };
var mergeSortInt = new MergeSort<int>();
var sortedInts = mergeSortInt.Sort(ints);
ints.Sort();
if (Enumerable.SequenceEqual(ints, sortedInts)) Console.WriteLine("ints sorts ok");

When debugging You can see that before ints.Sort() a sortedInts.ToList() returns

[0]: 2
[1]: 5
[2]: 8

but after ints.Sort() it returns

[0]: 2
[1]: 5
[2]: 5

What is really happening here?

Upvotes: 6

Views: 1394

Answers (4)

Kobi
Kobi

Reputation: 138017

Your function is correct - if you inspect the result of Merge, you'll see the result is sorted (example).
So where's the problem? Just as you've suspected, you're testing it wrong - when you call Sort on your original list you change all collections that derive from it!
Here's a snippet that demonstrates what you did:

List<int> numbers = new List<int> {5, 4};
IEnumerable<int> first = numbers.Take(1);
Console.WriteLine(first.Single()); //prints 5
numbers.Sort();
Console.WriteLine(first.Single()); //prints 4!

All collections you create are basically the same as first - in a way, they are lazy pointers to positions in ints. Obviously, when you call ToList, the problem is eliminated.

Your case is more complex than that. Your Sort is part lazy, exactly as you suggest: First you create a list (arrSorted) and add integers to it. That part isn't lazy, and is the reason you see the first few elements sorted. Next, you add the remaining elements - but Concat is lazy. Now, recursion enters to mess this even more: In most cases, most elements on your IEnumerable are eager - you create lists out of left and right, which are also made of mostly eager + lazy tail. You end up with a sorted List<int>, lazily concated to a lazy pointer, which should be just the last element (other elements were merged before).
Here's a call graph of your functions - red indicated a lazy collection, black a real number:

alt text

When you change the list the new list is mostly intact, but the last element is lazy, and point to the position of the largest element in the original list.

The result is mostly good, but its last element still points to the original list:

alt text

One last example: consider you're changing all elements in the original list. As you can see, most elements in the sorted collection remain the same, but the last is lazy and points to the new value:

var ints = new List<int> { 3,2,1 };
var mergeSortInt = new MergeSort<int>();
var sortedInts = mergeSortInt.Sort(ints);
// sortedInts is { 1, 2, 3 }
for(int i=0;i<ints.Count;i++) ints[i] = -i * 10;
// sortedInts is { 1, 2, 0 }

Here's the same example on Ideone: http://ideone.com/FQVR7

Upvotes: 10

Itay Karo
Itay Karo

Reputation: 18286

I ran it with and without the list and it worked.
Anyway, one of the strengths points of merge sort is its ability to sort in-place with O(1) space complexity that this implementation will not benefit.

Upvotes: 2

Jon Skeet
Jon Skeet

Reputation: 1500495

Unable to reproduce - I've just tried this, and it works absolutely fine. Obviously it's rather inefficient in various ways, but removing the ToList calls didn't make it fail.

Here's my test code, with your MergeSort code as-is, but without the ToList() calls:

using System;
using System.Collections.Generic;

public static class Extensions
{
    public static void Dump<T>(this IEnumerable<T> items, string name)
    {
        Console.WriteLine(name);
        foreach (T item in items)
        {
            Console.Write(item);
            Console.Write(" ");
        }
        Console.WriteLine();
    }
}

class Test
{    
    static void Main()
    {
        var ints = new List<int> { 5, 8, 2, 1, 7 };
        var mergeSortInt = new MergeSort<int>();
        var sortedInts = mergeSortInt.Sort(ints);
        sortedInts.Dump("Sorted");
    }
}

Output:

Sorted
1 2 5 7 8

Perhaps the problem was how you were testing your code?

Upvotes: 6

user432219
user432219

Reputation:

The problem is you sort the left right, than the right side and merge to one sequence. That does not mean that you get a completely sorted sequence.

First you have to merge and than you have to sort:

public IEnumerable<T> Sort(IEnumerable<T> arr)
{
    if (arr.Count() <= 1) return arr;

    int middle = arr.Count() / 2;
    var left = arr.Take(middle).ToList();
    var right = arr.Skip(middle).ToList();

    // first merge and than sort
    return Sort(Merge(left, right));
}

Upvotes: 0

Related Questions