kunigami
kunigami

Reputation: 3086

Improving performance on chunked lists

I have a simple problem: Given a list of integers, read the first line as N. Then, read the next N lines and return the sum of them. Repeat until N = 0.

My first approach was using this:

main = interact $ unlines . f . (map read) . lines

f::[Int] -> [String]
f (n:ls)
  | n == 0    = []
  | otherwise = [show rr] ++ (f rest)
     where (xs, rest) = splitAt n ls
           rr = sum xs
f _ = []

But it's relatively slow. I've profiled it using

ghc -O2 --make test.hs -prof -auto-all -caf-all -fforce-recomp -rtsopts
time ./test +RTS -hc -p -i0.001 < input.in

Where input.in is a test input where the first line is 100k, followed by 100k random numbers, followed by 0. We can see in the Figure below that it's using O(N) memory:

enter image description here

EDITED: My original question was comparing 2 similarly slow approaches. I've updated it to compare with an optimized approach below

Now, if I do the sum iteratively, instead of calling sum, I get constant amount of memory

{-# LANGUAGE BangPatterns #-}

main = interact $ unlines . g . (map read) . lines

g::[Int] -> [String]
g (n:ls)
  | n == 0    = []
  | otherwise = g' n ls 0
g _ = []

g' n (l:ls) !cnt
  | n == 0 = [show cnt] ++ (g (l:ls))
  | otherwise = g' (n-1) ls (cnt + l)

enter image description here

I'm trying to understand what is causing the performance loss in the first example. I would guess everything there could be lazily evaluated?

Upvotes: 4

Views: 141

Answers (2)

&#216;rjan Johansen
&#216;rjan Johansen

Reputation: 18189

I think that laziness is what prevents your first and second version from being equivalent.

Consider the result created from the input "numbers"

1
garbage_here
2
3
5
0

The first version would give a result list [error "...some parse error", 8], which you can safely look at the second element of, while the second version errors near immediately. It seems hard to achieve the first in a streaming way.

Even without laziness, though, getting from the first to the second version may be more than GHC can handle - it would need to have fusion rewriting rules combining foldl/foldl' on the first element of a tuple with splitAt. And GHC has only recently got to the point where it can fuse foldl/foldl' at all.

Upvotes: 1

MathematicalOrchid
MathematicalOrchid

Reputation: 62818

I don't know precisely what is causing the difference. But I can show you this:

Data.Map> sum [1 .. 1e8]
Out of memory.

Data.Map> foldl' (+) 0 [1 .. 1e8]
5.00000005e15

For some reason, sum = foldl (+) 0, rather than foldl' (with the apostrophe). The difference is that the latter function is more strict, so it uses virtually no memory. The lazy version, by contrast, does this:

sum [1..100]
1 + sum [2..100]
1 + 2 + sum [3..100]
1 + 2 + 3 + sum [4.100]
...

In other words, it creates a giant expression that says 1 + 2 + 3 + ... And then, right at the end, it tries to evaluate it all. Well, obviously, that's going to eat a lot of RAM. By using foldl' instead of foldl, you make it do the additions immediately, rather than pointlessly storing them in RAM.

You probably also want to do I/O using ByteString rather than String; but the laziness difference will probably give you a big speed boost on its own.

Upvotes: 3

Related Questions