tonlika
tonlika

Reputation: 737

Why is MergeSort in Haskell faster when implemented with foldl'?

I have implemented two versions of Merge Sort in Haskell like follows:

mergeSort1 :: (Ord a) => [a] -> [a]
mergeSort1 xs = foldl' (\acc x -> merge [x] acc) [] xs

and

mergeSort2 :: (Ord a) => [a] -> [a]
mergeSort2 [] = []
mergeSort2 (x:[]) = [x]
mergeSort2 xs = (mergeSort2 $ fst halves) `merge` (mergeSort2 $ snd halves)
         where halves = splitList xs

where 'merge' and 'splitList' are implemented as follows:

merge :: (Ord a) => [a] -> [a] -> [a]
merge [] [] = []
merge xs [] = xs
merge [] ys = ys
merge all_x@(x:xs) all_y@(y:ys)
     | x < y = x:merge xs all_y
     | otherwise = y:merge all_x ys

splitList :: [a] -> ([a], [a])
splitList zs = go zs [] [] where
     go [] xs ys = (xs, ys)
     go [x] xs ys = (x:xs, ys)
     go (x:y:zs) xs ys = go zs (x:xs) (y:ys)

Doing last $ mergeSort2 [1000000,999999..0] in ghci results in showing the number 1000000 after more than a minute of processing, while doing last $ mergeSort1 [1000000,999999..0] results in showing the last element only after 5 seconds.

I can understand why mergeSort1 uses much less memory than mergeSort2 because of the tail-recursiveness of foldl' and so.

What I can't understand is why mergeSort1 is faster than mergeSort2 by such a big difference ?

Could it be that splitList is the bottleneck in mergeSort2, generating two new lists every call?

Upvotes: 3

Views: 932

Answers (1)

Daniel Fischer
Daniel Fischer

Reputation: 183888

As is,

mergeSort2 :: (Ord a) => [a] -> [a]
mergeSort2 xs = (mergeSort2 $ fst halves) `merge` (mergeSort2 $ snd halves)
         where halves = splitList xs

is an infinite recursion, since you haven't given a base case (you need to specify the result for lists of length < 2). After that is fixed, mergeSort2 is still relatively slow due to the splitList which requires a complete traversal in each step and builds two new lists, not allowing to process anything before that is completed. A simple

splitList zs = splitAt h zs where h = length zs `quot` 2

does much better.

Your mergeSort1, however, is not a merge sort at all, it is an insertion sort.

mergeSort1 :: (Ord a) => [a] -> [a]
mergeSort1 xs = foldl' (\acc x -> merge [x] acc) [] xs

That does particularly well on reverse-sorted input, but if you give it sorted or random input, it scales quadratically.

So mergeSort1 was faster because you gave it optimal input, where it finishes in linear time.

Upvotes: 9

Related Questions