pallly
pallly

Reputation: 679

Partial application memory management in Haskell

I have a function ascArr :: String -> BigData for parsing some big strict data from a string and another one, altitude :: BigData -> Pt -> Maybe Double, for getting something useful from the parsed data. I want to parse the big data once and then use the altitude function with the first argument fixed and the second one varying. Here's the code (TupleSections are enabled):

exampleParseAsc :: IO ()
exampleParseAsc = do
  asc <- readFile "foo.asc"
  let arr = ascArr asc
  print $ map (altitude arr . (, 45)) [15, 15.01 .. 16]

This is all ok. Then I want to connect the two functions together and to use partial application for caching the big data. I use three versions of the same function:

parseAsc3 :: String -> Pt -> Maybe Double
parseAsc3 str = altitude d
  where d = ascArr str

parseAsc4 :: String -> Pt -> Maybe Double
parseAsc4 str pt = altitude d pt
  where d = ascArr str

parseAsc5 :: String -> Pt -> Maybe Double
parseAsc5 = curry (uncurry altitude . first ascArr)

And I call them like this:

exampleParseAsc2 :: IO ()
exampleParseAsc2 = do
  asc <- readFile "foo.asc"
  let alt = parseAsc5 asc
  print $ map (alt . (, 45)) [15, 15.01 .. 16]

Only the parseAsc3 works like in the exampleParseAsc: Memory usage rises at the beginning (when allocating memory for the UArray in the BigData), then it is constant while parsing, then altitude quickly evaluates the result and then everything is done and the memory is freed. The other two versions are different: The memory usage rises multiple times until all the memory is consumed, I think that the parsed big data is not cached inside the alt closure. Could someone explain the behaviour? Why are the versions 3 and 4 not equivalent? In fact I started with something like parseAsc2 function and just after hours of trial I found out the parseAsc3 solution. And I am not satisfied without knowing the reason...

Here you can see all my effort (only the parseAsc3 does not consume whole the memory; parseAsc is a bit different from the others - it uses parsec and it was really greedy for memory, I'd be glad if some one explained me why, but I think that the reason is different than the main point of this question, you may just skip it):

type Pt      = (Double, Double)
type BigData = (UArray (Int, Int) Double, Double, Double, Double)

parseAsc :: String -> Pt -> Maybe Double
parseAsc str (x, y) =
  case parse ascParse "" str of
    Left err -> error "no parse"
    Right (x1, y1, coef, m) ->
      let bnds = bounds m
          i    = (round $ (x - x1) / coef, round $ (y - y1) / coef)
      in if inRange bnds i then Just $ m ! i else Nothing
 where
  ascParse :: Parsec String () (Double, Double, Double, UArray (Int, Int) Double)
  ascParse = do
    [w, h] <- mapM ((read <$>) . keyValParse digit) ["ncols", "nrows"]
    [x1, y1, coef] <- mapM ((read <$>) . keyValParse (digit <|> char '.'))
                           ["xllcorner", "yllcorner", "cellsize"]
    keyValParse anyChar "NODATA_value"
    replicateM 6 $ manyTill anyChar newline
    rows <- replicateM h . replicateM w
          $ read <$> (spaces *> many1 digit)

    return (x1, y1, coef, listArray ((0, 0), (w - 1, h - 1)) (concat rows))

  keyValParse :: Parsec String () Char -> String -> Parsec String () String
  keyValParse format key = string key *> spaces *> manyTill format newline

parseAsc2 :: String -> Pt -> Maybe Double
parseAsc2 str (x, y) = if all (inRange bnds) (is :: [(Int, Int)])
                         then Just $ (ff * (1 - px) + cf * px) * (1 - py)
                                   + (fc * (1 - px) + cc * px) * py
                         else Nothing
  where (header, elevs) = splitAt 6 $ lines str
        header' = map ((!! 1) . words) header
        [w, h] = map read $ take 2 header'
        [x1, y1, coef, _] = map read $ drop 2 header'
        bnds = ((0, 0), (w - 1, h - 1))

        arr :: UArray (Int, Int) Double
        arr = listArray bnds (concatMap (map read . words) elevs)

        i = [(x - x1) / coef, (y - y1) / coef]
        [ixf, iyf, ixc, iyc] = [floor, ceiling] >>= (<$> i)
        is = [(ix, iy) | ix <- [ixf, ixc], iy <- [iyf, iyc]]
        [px, py] = map (snd . properFraction) i
        [ff, cf, fc, cc] = map (arr !) is

ascArr :: String -> BigData
ascArr str = (listArray bnds (concatMap (map read . words) elevs), x1, y1, coef)
  where (header, elevs) = splitAt 6 $ lines str
        header' = map ((!! 1) . words) header
        [w, h] = map read $ take 2 header'
        [x1, y1, coef, _] = map read $ drop 2 header'
        bnds = ((0, 0), (w - 1, h - 1))

altitude :: BigData -> Pt -> Maybe Double
altitude d (x, y) = if all (inRange bnds) (is :: [(Int, Int)])
                      then Just $ (ff * (1 - px) + cf * px) * (1 - py)
                                + (fc * (1 - px) + cc * px) * py
                      else Nothing
  where (arr, x1, y1, coef) = d
        bnds = bounds arr
        i = [(x - x1) / coef, (y - y1) / coef]
        [ixf, iyf, ixc, iyc] = [floor, ceiling] >>= (<$> i)
        is = [(ix, iy) | ix <- [ixf, ixc], iy <- [iyf, iyc]]
        [px, py] = map (snd . properFraction) i
        [ff, cf, fc, cc] = map (arr !) is

parseAsc3 :: String -> Pt -> Maybe Double
parseAsc3 str = altitude d
  where d = ascArr str

parseAsc4 :: String -> Pt -> Maybe Double
parseAsc4 str pt = altitude d pt
  where d = ascArr str

parseAsc5 :: String -> Pt -> Maybe Double
parseAsc5 = curry (uncurry altitude . first ascArr)

Compiled with GHC 7.10.3, with -O optimization.

Thank you.

Upvotes: 3

Views: 98

Answers (1)

user2407038
user2407038

Reputation: 14578

You can figure out what's happening by looking at the generated core from GHC. The evaluation semantics of optimized core are very predictable (unlike Haskell itself) so it is often a useful tool for performance analysis.

I compiled your code with ghc -fforce-recomp -O2 -ddump-simpl file.hs with GHC 7.10.3. You can look at the full output yoursefl but I've extracted the relevant bits:

$wparseAsc2
$wparseAsc2 =
  \ w_s8e1 ww_s8e5 ww1_s8e6 ->
    let { ...

parseAsc2 =
  \ w_s8e1 w1_s8e2 ->
    case w1_s8e2 of _ { (ww1_s8e5, ww2_s8e6) ->
    $wparseAsc2 w_s8e1 ww1_s8e5 ww2_s8e6
    }

The code above looks a little funny but is essentially Haskell. Note that the first thing parseAsc2 does is force its second argument to be evaluated (the case statement evaluates the tuple, which corresponds to the pattern match) - but not the string. The string won't be touched until deep inside $wParseAsc2 (definition omitted). But the part of the function that computes the "parse" is inside the lambda - it will be recomputed for every invocation of the function. You don't even have to look at what it is - the rules for evaluating core expressions are very prescriptive.

$wparseAsc
$wparseAsc =
  \ w_s8g9 ww_s8gg ww1_s8gi -> ...

parseAsc
parseAsc =
  \ w_s8g9 w1_s8ga ->
    case w1_s8ga of _ { (ww1_s8gd, ww2_s8gi) ->
    case ww1_s8gd of _ { D# ww4_s8gg ->
    $wparseAsc w_s8g9 ww4_s8gg ww2_s8gi
    }
    }

The situation with parseAsc has little to do with Parsec*. This is much like version two - now both arguments are evaluated, however. This has little effect, however, on the performance, because the same problem is there - $wparseAsc is just a lambda, meaning all the work it does is done at every invocation of the function. There can be no sharing.

parseAsc3 =
  \ str_a228 ->
    let {
      w_s8c1
      w_s8c1 =
        case $wascArr str_a228
        of _ { (# ww1_s8gm, ww2_s8gn, ww3_s8go, ww4_s8gp #) ->
        (ww1_s8gm, ww2_s8gn, ww3_s8go, ww4_s8gp)
        } } in
    \ w1_s8c2 ->
      case w1_s8c2 of _ { (ww1_s8c5, ww2_s8c6) ->
      $waltitude w_s8c1 ww1_s8c5 ww2_s8c6
      }

Here is the "good" version. It takes a string, applies $wascArr to it, and then the string is never used again. This is crucial - if this function is partially applied to a string, you are left with let w_s = .. in \w1 -> ... - none of this mentions the string, so it can be garbage collected. The long lived reference is to w_s which is your "big data". And note: even if a reference to the string was maintained, and it could not be garbage collected, this version would still be substantially better - simply because it does not recompute the "parse" at each invocation of the function. This is the critical flaw - the fact that the string can be garbage collected immediately is extra.

parseAsc4 =
  \ str_a22a pt_a22b ->
    case pt_a22b of _ { (ww1_s8c5, ww2_s8c6) ->
    $waltitude (ascArr str_a22a) ww1_s8c5 ww2_s8c6
    }

Same issue as version two. Unlike version three, if you partially apply this, you get \w1 -> altitude (ascArr ...) ..., so ascArr is recomputed for every invocation of the function. It doesn't matter how you use this function - it simply won't work the way you want.

parseAsc5 = parseAsc4

Amazingly (to me), GHC figures out that parseAsc5 is precisely the same as parseAsc4! Well this one should be obvious then.


As for why GHC generates this particular core for this code, it really isn't easy to tell. In many cases the only way to guarantee sharing is to have explicit sharing in your original code. GHC does not do common subexpression elimination - parseAsc3 implements manual sharing.


*Maybe the parser itself has some performance issues too, but that isn't the focus here. If you have question about your Parsec parser (performance wise, or otherwise) I encourage you to ask a separate question.

Upvotes: 5

Related Questions