Reputation: 5279
I want to take n biggest elements from lazy list.
I heard that mergesort implemented in Data.List.sort is lazy and it doesn't produce more elements than necessary. This might be true in terms of comparisons, but certainly isn't the case when it comes to memory usage. The following program illustrates the issue:
{-# LANGUAGE ScopedTypeVariables #-}
module Main where
import qualified Data.Heap as Heap
import qualified Data.List as List
import System.Random.MWC
import qualified Data.Vector.Unboxed as Vec
import System.Environment
limitSortL n xs = take n (List.sort xs)
limitSortH n xs = List.unfoldr Heap.uncons (List.foldl' (\ acc x -> Heap.take n (Heap.insert x acc) ) Heap.empty xs)
main = do
st <- create
rxs :: [Int] <- Vec.toList `fmap` uniformVector st (10^7)
args <- getArgs
case args of
["LIST"] -> print (limitSortL 20 rxs)
["HEAP"] -> print (limitSortH 20 rxs)
return ()
Runtime:
Data.List:
./lazyTest LIST +RTS -s [-9223371438221280004,-9223369283422017686,-9223368296903201811,-9223365203042113783,-9223364809100004863,-9223363058932210878,-9223362160334234021,-9223359019266180408,-9223358851531436915,-9223345045262962114,-9223343191568060219,-9223342956514809662,-9223341125508040302,-9223340661319591967,-9223337771462470186,-9223336010230770808,-9223331570472117335,-9223329558935830150,-9223329536207787831,-9223328937489459283] 2,059,921,192 bytes allocated in the heap 2,248,105,704 bytes copied during GC 552,350,688 bytes maximum residency (5 sample(s)) 3,390,456 bytes maximum slop 1168 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 3772 collections, 0 parallel, 1.44s, 1.48s elapsed Generation 1: 5 collections, 0 parallel, 0.90s, 1.13s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 0.82s ( 0.84s elapsed) GC time 2.34s ( 2.61s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 3.16s ( 3.45s elapsed) %GC time 74.1% (75.7% elapsed) Alloc rate 2,522,515,156 bytes per MUT second Productivity 25.9% of total user, 23.7% of total elapsed
Data.Heap:
./lazyTest HEAP +RTS -s [-9223371438221280004,-9223369283422017686,-9223368296903201811,-9223365203042113783,-9223364809100004863,-9223363058932210878,-9223362160334234021,-9223359019266180408,-9223358851531436915,-9223345045262962114,-9223343191568060219,-9223342956514809662,-9223341125508040302,-9223340661319591967,-9223337771462470186,-9223336010230770808,-9223331570472117335,-9223329558935830150,-9223329536207787831,-9223328937489459283] 177,559,536,928 bytes allocated in the heap 237,093,320 bytes copied during GC 80,031,376 bytes maximum residency (2 sample(s)) 745,368 bytes maximum slop 78 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 338539 collections, 0 parallel, 1.24s, 1.31s elapsed Generation 1: 2 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 35.24s ( 35.46s elapsed) GC time 1.24s ( 1.31s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 36.48s ( 36.77s elapsed) %GC time 3.4% (3.6% elapsed) Alloc rate 5,038,907,812 bytes per MUT second Productivity 96.6% of total user, 95.8% of total elapsed
Clearly limitSortL is much faster, but it's also very memory hungry. On larger lists it hit's RAM size.
Is there a faster algorithm to solve this problem which isn't that memory hungry?
Edit: Clarification: I use Data.Heap from heaps package, I didn't try the heap package.
Upvotes: 7
Views: 769
Reputation: 28539
Memory efficiency is rarely haskell's strength. That said, it is not that hard to produce a sorting algorithm that is more fully lazy than a mergesort. For example, here is a simple quicksort:
qsort [] = []
qsort (x:xs) = qcombine (qsort a) b (qsort c) where
(a,b,c) = qpart x (x:xs) ([],[],[])
qpart _ [] ac = ac
qpart n (x:xs) (a,b,c)
| x > n = qpart n xs (a,b,x:c)
| x < n = qpart n xs (x:a,b,c)
| otherwise = qpart n xs (a,x:b,c)
qcombine (a:as) b c = a:qcombine as b c
qcombine [] (b:bs) c = b:qcombine [] bs c
qcombine [] [] c = c
I used explicit recursion to make it obvious what is going on. Each part here is truly lazy, meaning that qcombine
will never call qsort c
unless it needs it. This should keep your memory usage down if you just want the first few items.
You can build a better sorting algorithm for this specific task that uses quicksort style partions to get the first n items of the list in unsorted order. Then just call a highly efficient sorting algorithm on those if you need them in order.
an example of that approach:
qselect 0 _ = []
qselect n [] = error ("cant produce " ++ show n ++ " from empty list")
qselect n (x:xs)
| al > n = qselect n a
| al + bl > n = a ++ take (al - n) b
| otherwise = a ++ b ++ (qselect (n - al - bl) c) where
(a,al,b,bl,c,cl) = qpartl x (x:xs) ([],0,[],0,[],0)
qpartl _ [] ac = ac
qpartl n (x:xs) (a,al,b,bl,c,cl)
| x > n = qpartl n xs (a,al,b,bl,x:c,cl+1)
| x < n = qpartl n xs (x:a,al+1,b,bl,c,cl+1)
| otherwise = qpartl n xs (a,al,x:b,bl+1,c,cl)
Again, this code is not the cleanest, but I want to make it clear what it is doing.
For the case where you want to take a very low number, a selection sort is optimal. For example, if you just want the highest element in the list, you can iterate through it once giving big theta of the size of the list.
On the other hand, if you want almost all the list, but dont care about getting it in order, you could repeatedly "delete" the lowest elements in the list.
Both of these approaches, and the quicksort above, are O(n^2), but what you want is to have a strategy that works in big O(k*n) often, and tends to not use a ton of space.
Another option is to use an in-place sorting algorithm to control memory usage. I dont know of any lazy in-place sorts, but if they exist that would be perfect.
Upvotes: 0
Reputation: 38893
"Quicksort and k-th smallest elements," by the always engaging Heinrich Apfelmus: http://apfelmus.nfshost.com/articles/quicksearch.html
Upvotes: 2
Reputation: 5279
So, I've actually managed to solve the problem. The idea is to throw away fancy data structures and work by hand ;-)
Essentially we split input list into chunks, sort them, and foldl the [[Int]]
list, selecting n
smallest elements at each step.
The trickies part is merging accumulator with sorted chunk in proper way. We have to use seq
or otherwise the lazyness will bite you and the result still need lot's of memory to compute. Additionally I mix merge with take n
, just to optimize things more. Here is the whole program, along with previous attempts:
{-# LANGUAGE ScopedTypeVariables, PackageImports #-}
module Main where
import qualified Data.List as List
import qualified Data.List.Split as Split
import qualified "heaps" Data.Heap as Heap -- qualified import from "heaps" package
import System.Random.MWC
import qualified Data.Vector.Unboxed as Vec
import System.Environment
limitSortL n xs = take n (List.sort xs)
limitSortH n xs = List.unfoldr Heap.uncons (List.foldl' (\ acc x -> Heap.take n (Heap.insert x acc) ) Heap.empty xs)
takeSortMerge n inp = List.foldl'
(\acc lst -> (merge n acc (List.sort lst)))
[] (Split.splitEvery n inp)
where
merge 0 _ _ = []
merge _ [] xs = xs
merge _ ys [] = ys
merge f (x:xs) (y:ys) | x < y = let tail = merge (f-1) xs (y:ys) in tail `seq` (x:tail)
| otherwise = let tail = merge (f-1) (x:xs) ys in tail `seq` (y:tail)
main = do
st <- create
let n1 = 10^7
n2 = 20
rxs :: [Int] <- Vec.toList `fmap` uniformVector st (n1)
args <- getArgs
case args of
["LIST"] -> print (limitSortL n2 rxs)
["HEAP"] -> print (limitSortH n2 rxs)
["MERGE"] -> print (takeSortMerge n2 rxs)
_ -> putStrLn "Nothing..."
return ()
Runtime performance, memory consumption, GC time:
LIST 3.96s 1168 MB 75 % HEAP 35.29s 78 MB 3.6 % MERGE 1.00s 78 MB 3.0 % just rxs 0.21s 78 MB 0.0 % -- just evaluating the random vector
Upvotes: 4
Reputation: 33063
You might be misdiagnosing the problem. It might be a case of too much laziness, rather than too little.
Maybe you should try a stricter data structure, or a mutable array in the ST monad.
For the mutable array approach, you could limit the number of moves per insertion to n/2
instead of n-1
, by recording an index h
that "points to" the head of the queue stored in the array, and allowing the queue to "wrap around" inside the array.
Upvotes: 0
Reputation: 69934
There are a whole lot of selection algorithms specialized in doing exactly this. The partition based algorithm is the "classic one", but just like Quicksort it isn't really suitable for Haskell lists. The wikipedia doesn't show much related to functional programming, although I suspect that the "tournament selection" described is the same or not very different from your current mergesort solution.
If you are worried about memory consumption, you could use a priority Queue - it uses O(K) memory and O(N*logK) time overall:
queue := first k elements
for each element in the rest:
add the element to the queue
remove the largest element from the queue
convert the queue to a sorted list
Upvotes: 3
Reputation: 36339
Excuse me if I can't decipher
Vec.toList `fmap` uniformVector st (10^7)
but how long will this list be? Is it clear that no matter how lazy mergesort is, it will at least have to realize the whole list?
Update:
I heard that mergesort implemented in Data.List.sort is lazy and it doesn't produce more elements than necessary.
That tells nothing about mergesorts space consumption before it can start to deliver the first elements of a list. In any case, it'll have to walk (and thereby realize) the whole list, allocate to be merged sublists, etc. Here is acitation from http://www.inf.fh-flensburg.de/lang/algorithmen/sortieren/merge/mergen.htm
A drawback of mergesort is that it needs an additional space of Θ(n) for the temporary array b.
There are different possibilities to implement function merge. The most efficient of these is variant b. It requires only half as much additional space, it is faster than the other variants, and it is stable.
Upvotes: 1