Reputation: 31428
While implementing this generic merge sort, as a kind of Code Kata, I stumbled on a difference between IEnumerable and List that I need help to figure out.
Here's the MergeSort
public class MergeSort<T>
{
public IEnumerable<T> Sort(IEnumerable<T> arr)
{
if (arr.Count() <= 1) return arr;
int middle = arr.Count() / 2;
var left = arr.Take(middle).ToList();
var right = arr.Skip(middle).ToList();
return Merge(Sort(left), Sort(right));
}
private static IEnumerable<T> Merge(IEnumerable<T> left, IEnumerable<T> right)
{
var arrSorted = new List<T>();
while (left.Count() > 0 && right.Count() > 0)
{
if (Comparer<T>.Default.Compare(left.First(), right.First()) < 0)
{
arrSorted.Add(left.First());
left=left.Skip(1);
}
else
{
arrSorted.Add(right.First());
right=right.Skip(1);
}
}
return arrSorted.Concat(left).Concat(right);
}
}
If I remove the .ToList()
on the left
and right
variables it fails to sort correctly. Do you see why?
Example
var ints = new List<int> { 5, 8, 2, 1, 7 };
var mergeSortInt = new MergeSort<int>();
var sortedInts = mergeSortInt.Sort(ints);
With .ToList()
[0]: 1 [1]: 2 [2]: 5 [3]: 7 [4]: 8
Without .ToList()
[0]: 1 [1]: 2 [2]: 5 [3]: 7 [4]: 2
Edit
It was my stupid test that got me.
I tested it like this:
var sortedInts = mergeSortInt.Sort(ints);
ints.Sort();
if (Enumerable.SequenceEqual(ints, sortedInts)) Console.WriteLine("ints sorts ok");
just changing the first row to
var sortedInts = mergeSortInt.Sort(ints).ToList();
removes the problem (and the lazy evaluation).
EDIT 2010-12-29
I thought I would figure out just how the lazy evaluation messes things up here but I just don't get it.
Remove the .ToList()
in the Sort method above like this
var left = arr.Take(middle);
var right = arr.Skip(middle);
then try this
var ints = new List<int> { 5, 8, 2 };
var mergeSortInt = new MergeSort<int>();
var sortedInts = mergeSortInt.Sort(ints);
ints.Sort();
if (Enumerable.SequenceEqual(ints, sortedInts)) Console.WriteLine("ints sorts ok");
When debugging You can see that before ints.Sort()
a sortedInts.ToList()
returns
[0]: 2
[1]: 5
[2]: 8
but after ints.Sort()
it returns
[0]: 2
[1]: 5
[2]: 5
What is really happening here?
Upvotes: 6
Views: 1394
Reputation: 138017
Your function is correct - if you inspect the result of Merge
, you'll see the result is sorted (example).
So where's the problem? Just as you've suspected, you're testing it wrong - when you call Sort
on your original list you change all collections that derive from it!
Here's a snippet that demonstrates what you did:
List<int> numbers = new List<int> {5, 4};
IEnumerable<int> first = numbers.Take(1);
Console.WriteLine(first.Single()); //prints 5
numbers.Sort();
Console.WriteLine(first.Single()); //prints 4!
All collections you create are basically the same as first
- in a way, they are lazy pointers to positions in ints
. Obviously, when you call ToList
, the problem is eliminated.
Your case is more complex than that. Your Sort
is part lazy, exactly as you suggest: First you create a list (arrSorted
) and add integers to it. That part isn't lazy, and is the reason you see the first few elements sorted. Next, you add the remaining elements - but Concat
is lazy. Now, recursion enters to mess this even more: In most cases, most elements on your IEnumerable
are eager - you create lists out of left and right, which are also made of mostly eager + lazy tail. You end up with a sorted List<int>
, lazily concated to a lazy pointer, which should be just the last element (other elements were merged before).
Here's a call graph of your functions - red indicated a lazy collection, black a real number:
When you change the list the new list is mostly intact, but the last element is lazy, and point to the position of the largest element in the original list.
The result is mostly good, but its last element still points to the original list:
One last example: consider you're changing all elements in the original list. As you can see, most elements in the sorted collection remain the same, but the last is lazy and points to the new value:
var ints = new List<int> { 3,2,1 };
var mergeSortInt = new MergeSort<int>();
var sortedInts = mergeSortInt.Sort(ints);
// sortedInts is { 1, 2, 3 }
for(int i=0;i<ints.Count;i++) ints[i] = -i * 10;
// sortedInts is { 1, 2, 0 }
Here's the same example on Ideone: http://ideone.com/FQVR7
Upvotes: 10
Reputation: 18286
I ran it with and without the list and it worked.
Anyway, one of the strengths points of merge sort is its ability to sort in-place with O(1) space complexity that this implementation will not benefit.
Upvotes: 2
Reputation: 1500495
Unable to reproduce - I've just tried this, and it works absolutely fine. Obviously it's rather inefficient in various ways, but removing the ToList
calls didn't make it fail.
Here's my test code, with your MergeSort
code as-is, but without the ToList()
calls:
using System;
using System.Collections.Generic;
public static class Extensions
{
public static void Dump<T>(this IEnumerable<T> items, string name)
{
Console.WriteLine(name);
foreach (T item in items)
{
Console.Write(item);
Console.Write(" ");
}
Console.WriteLine();
}
}
class Test
{
static void Main()
{
var ints = new List<int> { 5, 8, 2, 1, 7 };
var mergeSortInt = new MergeSort<int>();
var sortedInts = mergeSortInt.Sort(ints);
sortedInts.Dump("Sorted");
}
}
Output:
Sorted
1 2 5 7 8
Perhaps the problem was how you were testing your code?
Upvotes: 6
Reputation:
The problem is you sort the left right, than the right side and merge to one sequence. That does not mean that you get a completely sorted sequence.
First you have to merge and than you have to sort:
public IEnumerable<T> Sort(IEnumerable<T> arr)
{
if (arr.Count() <= 1) return arr;
int middle = arr.Count() / 2;
var left = arr.Take(middle).ToList();
var right = arr.Skip(middle).ToList();
// first merge and than sort
return Sort(Merge(left, right));
}
Upvotes: 0