user1490083
user1490083

Reputation: 391

Haskell Merging multiple lists

I want to write a merge function that takes multiple x sorted lists and merges them into just one sorted list by incremental values (smallest to largest). I think I can do two lists and combine into one, but can't figure out the base case for multiple lists and combining into one sorted.

merge :: [[a]] -> [a]

Upvotes: 4

Views: 8177

Answers (6)

Redu
Redu

Reputation: 26161

This is a fundamental part of the mergeSort or timSort algorithm but sadly there is no correct answer under this topic. I mean an efficient one.

Lets assume a chunks list with k = 9 chunks such as [as,bs,cs,ds,es,fs,gs,hs,is] where as..is are the chunks.

foldl merger [] chunks

would yield the following operations

merge as bs -> abs
merge abs cs -> abcs
merge abcs ds -> abcds
merge abcds es -> abcdes
merge abcdes fs -> abcdefs
merge abcdefs gs -> abcdefgs
merge abchefgs hs -> abcdefghs
merge abcdefghs is -> abcdefghis

Where as and bs get operated 8 times, cs gets operated 7 .. is gets operated 1 time. That's k(k+1)/2 = 44 operations in total. If you had 100,000 chunks that would make 5,000,050,000 operations. The complexity is O(n^2).

@kyticka seems to have noticed this fact and mentions in his answer about the importance of it but fails to implement it correctly (due to double recursion which is something deceiving sometimes). His approach eliminates excess operations only at the first stage and puts chunks in sorted couples but then carries on like the naive approach, only folding from right as follows;

let mt = mergeTwo
    mp = mergePairs

mp (mt as bs : mp [cs,ds,es,fs,gs,hs,is]))
mp (mt as bs : mp (mt cs ds : mp [es,fs,gs,hs,is]))
mp (mt as bs : mp (mt cs ds : mp (mt es fs : mp [gs,hs,is])))
mp (mt as bs : mp (mt cs ds : mp (mt es fs : mp (mt gs hs : mp [is])))
mp (mt as bs : mp (mt cs ds : mp (mt es fs : mp [ghs,is])))
mp (mt as bs : mp (mt cs ds : mp (mt es fs : mp (mt ghs is : mp []))))
mp (mt as bs : mp (mt cs ds : mp (mt es fs : mp [ghis])))
mp (mt as bs : mp (mt cs ds : mp [efs, ghis]))
mp (mt as bs : mp (mt cs ds : mp (mt efs ghis : mp [])))
mp (mt as bs : mp (mt cs ds : mp [efghis]))
mp (mt as bs : mp [cds, efghis])
mp (mt as bs : mp (mt cds efghis : mp []))
mp (mt as bs : mp [cdefghis])
mp [abs, cdefghis]
mp (mt abs cdefghis : mp [])
mp [abcdefghis]
[abcdefghis]

So chunks' merge operation counts are as follows

  • as = 2 (mt as bs - mt abs cdefghis)
  • bs = 2 (mt as bs - mt abs cdefghis)
  • cs = 3 (mt cs ds - mt cds efghis - mt abs cdefghis)
  • ds = 3 (mt cs ds - mt cds efghis - mt abs cdefghis)
  • es = 4 (mt es fs - mt efs ghis - mt cds efghis - mt cds efghis)
  • fs = 4 same as es
  • gs = 5 ...
  • hs = 5 same as gs
  • is = 5

Total number of operations is 33. I guess this is may be like k(k+1)/2 - k..? It wont make any noticable difference compared to the naive foldl solution since still it is O(n^2).

Here is an O(nLog(n)) attempt.

mergeChunks :: Ord a => [a] -> [a] -> [a]
mergeChunks xs [] = xs
mergeChunks [] ys = ys
mergeChunks p@(x:xs) q@(y:ys) | x <= y    = x : mergeChunks xs q
                              | otherwise = y : mergeChunks p ys

mergePairs :: Ord a => [[a]] -> [[a]]
mergePairs []         = []
mergePairs (ps:[])    = [ps]
mergePairs (ps:qs:rs) = mergeChunks ps qs : mergePairs rs

mergeAll :: Ord a => [[a]] -> [a]
mergeAll []     = []
mergeAll [c]    = c
mergeAll cs     = mergeAll (mergePairs cs)

Upvotes: 2

jaket
jaket

Reputation: 9341

It's much easier if you first merge the lists with concat and then sort.

import Data.List(sort)

mergeSort = sort . concat

Upvotes: 3

Ffffrrr
Ffffrrr

Reputation: 1

import Data.List ( sort )

sortLists :: (Ord a) => [[a]] -> [a]
sortLists cs = concatMap sort cs

test = [ [3,1,2],
         [5,4],
         [6,8,9,7] ]

main = print $ sortLists test

To really enjoy haskell, take a good look and learn by heart the Prelude and Data.List. They are the bread and butter of every Haskell programmer.

Upvotes: -3

luqui
luqui

Reputation: 60463

@sepp2k's answer is good, but it only works on finitely many input lists. If you give it infinitely many lists, it will take forever trying to find the minimum starting element.

We can fix this by requiring that the input lists are already sorted by increasing first elements. Then we know that we can yield the "top-left" element (first element of the first list) because it will be a lower bound for everything, which gives us enough information to use recursively and produce a complete merge.

merge :: (Ord a) => [[a]] -> [a]
merge [] = []
merge ([]:xss) = merge xss
merge ((x:xs):xss) = x : merge2 xs (merge xss)

Writing merge2 still left as an exercise for the reader :-)

Upvotes: 6

kyticka
kyticka

Reputation: 604

Maybe a faster implementation:

mergeTwo :: Ord a => [a] -> [a] -> [a]
mergeTwo x [] = x
mergeTwo [] x = x
mergeTwo (x:xs) (y:ys) = if x < y
                          then x:(mergeTwo xs (y:ys))
                          else y:(mergeTwo (x:xs) ys)

mergePairs :: Ord a => [[a]] -> [[a]]
mergePairs [] = []
mergePairs (x:[]) = [x]
mergePairs (x:y:tail) = mergePairs ((mergeTwo x y):(mergePairs tail))

mergeAll :: Ord a => [[a]] -> [a]
mergeAll [] = []
mergeAll x = head $ mergePairs x

mergeTwo just merges two lists. mergeAll just runs mergePairs and returns head if there is some. Magic happens in mergePairs, which takes list of lists and merges pairs, than does this again and so on, while there are at least two lists.

It might be faster, imagine you are running

merge = foldl merge2 []

It takes one long list and merges and merges. If you run it at [[1,2,3],[4,5,6],[7,8,9],[10,11,12]], it merges:

[] with [1,2,3]

[1,2,3] with [4,5,6]

[1,2,3,4,5,6] with [7,8,9]

[1,2,3,4,5,6,7,8,9] with [10,11,12]

But you want to keep lists of approx same lenght. So you want to merge:

[1,2,3] with [4,5,6]

[7,8,9] with [10,11,12]

[1,2,3,4,5,6] with [7,8,9,10,11,12]

You could also consider paraller implementation of mergePairs, it could be useful on multicore processors. But I have no experience in this :/

Upvotes: 10

sepp2k
sepp2k

Reputation: 370112

If you can define the function for two lists, then you can generalize it to arbitrarily many lists by simply going through each sublist and merging it with the current result until you've merge all the lists. This can be expressed as a fold like this:

merge = foldr merge2 []

Upvotes: 8

Related Questions