ErikR
ErikR

Reputation: 52067

Short-lived memoization in Haskell?

In an object-oriented language when I need to cache/memoize the results of a function for a known life-time I'll generally follow this pattern:

  1. Create a new class
  2. Add to the class a data member and a method for each function result I want to cache
  3. Implement the method to first check to see if the result has been stored in the data member. If so, return that value; else call the function (with the appropriate arguments) and store the returned result in the data member.
  4. Objects of this class will be initialized with values that are needed for the various function calls.

This object-based approach is very similar to the function-based memoization pattern described here: http://www.bardiak.com/2012/01/javascript-memoization-pattern.html

The main benefit of this approach is that the results are kept around only for the life time of the cache object. A common use case is in the processing of a list of work items. For each work item one creates the cache object for that item, processes the work item with that cache object then discards the work item and cache object before proceeding to the next work item.

What are good ways to implement short-lived memoization in Haskell? And does the answer depend on if the functions to be cached are pure or involve IO?

Just to reiterate - it would be nice to see solutions for functions which involve IO.

Upvotes: 9

Views: 1047

Answers (4)

Jeff Burdges
Jeff Burdges

Reputation: 4261

I believe the above answers are both more complex than necessary, although they might be more portable than what I'm about to describe.

As I understand it, there is a rule in ghc that each value is computed exactly once when it's enclosing lambda expression is entered. You may thus create exactly your short lived memoization object as follows.

import qualified Data.Vector as V
indexerVector :: (t -> Int) -> V.Vector t -> Int -> [t]
indexerVector idx vec = \e -> tbl ! e
  where m = maximum $ map idx $ V.toList vec
        tbl = V.accumulate (flip (:)) (V.replicate m []) 
                 (V.map (\v -> (idx v, v)) vec)

What does this do? It groups all the elements in the Data.Vector t passed as it's second argument vec according to integer computed by it's first argument idx, retaining their grouping as a Data.Vector [t]. It returns a function of type Int -> [t] which looks up this grouping by this pre-computed index value.

Our compiler ghc has promised that tbl shall only be thunked once when we invoke indexerVector. We may therefore assign the lambda expression \e -> tbl ! e returned by indexVector to another value, which we may use repeatedly without fear that tbl ever gets recomputed. You may verify this by inserting a trace on tbl.

In short, your caching object is exactly this lambda expression.

I've found that almost anything you can accomplish with a short term object can be better accomplished by returning a lambda expression like this.

Upvotes: 2

Shimuuar
Shimuuar

Reputation: 336

You can use very same pattern in haskell too. Lazy evaluation will take care of checking whether value is evaluated already. It has been mentioned mupltiple times already but code example could be useful. In example below memoedValue will calculated only once when it is demanded.

data Memoed = Memoed
  { value       :: Int
  , memoedValue :: Int
  }

memo :: Int -> Memoed
memo i = Memoed
  { value       = i
  , memoedValue = expensiveComputation i
  }

Even better you can memoize values which depend on other memoized values. You shoud avoid dependecy loops. They can lead to nontermination

data Memoed = Memoed
  { value        :: Int
  , memoedValue1 :: Int
  , memoedValue2 :: Int
  }

memo :: Int -> Memoed
memo i = r
  where
  r = Memoed
    { value        = i
    , memoedValue1 = expensiveComputation i
    , memoedValue2 = anotherComputation (memoedValue1 r)
    }

Upvotes: 1

Dan Burton
Dan Burton

Reputation: 53725

Let's use Luke Palmer's memoization library: Data.MemoCombinators

import qualified Data.MemoCombinators as Memo
import Data.Function (fix) -- we'll need this too

I'm going to define things slightly different from how his library does, but it's basically the same (and furthermore, compatible). A "memoizable" thing takes itself as input, and produces the "real" thing.

type Memoizable a = a -> a

A "memoizer" takes a function and produces the memoized version of it.

type Memoizer a b = (a -> b) -> a -> b

Let's write a little function to put these two things together. Given a Memoizable function and a Memoizer, we want the resultant memoized function.

runMemo :: Memoizer a b -> Memoizable (a -> b) -> a -> b
runMemo memo f = fix (f . memo)

This is a little magic using the fixpoint combinator (fix). Never mind that; you can google it if you are interested.

So let's write a Memoizable version of the classic fib example:

fib :: Memoizable (Integer -> Integer)
fib self = go
  where go 0 = 1
        go 1 = 1
        go n = self (n-1) + self (n-2)

Using a self convention makes the code straightforward. Remember, self is what we expect to be the memoized version of this very function, so recursive calls should be on self. Now fire up ghci.

ghci> let fib' = runMemo Memo.integral fib
ghci> fib' 10000
WALL OF NUMBERS CRANKED OUT RIDICULOUSLY FAST

Now, the cool thing about runMemo is you can create more than one freshly memoized version of the same function, and they will not share memory banks. That means that I can write a function that locally creates and uses fib', but then as soon as fib' falls out of scope (or earlier, depending on the intelligence of the compiler), it can be garbage collected. It doesn't have to be memoized at the top level. This may or may not play nicely with memoization techniques that rely on unsafePerformIO. Data.MemoCombinators uses a pure, lazy Trie, which fits perfectly with runMemo. Rather than creating an object which essentially becomes a memoization manager, you can simply create memoized functions on demand. The catch is that if your function is recursive, it must be written as Memoizable. The good news is you can plug in any Memoizer that you wish. You could even use:

noMemo :: Memoizer a b
noMemo f = f

ghci> let fib' = runMemo noMemo fib
ghci> fib' 30 -- wait a while; it's computing stupidly
1346269

Upvotes: 12

dsign
dsign

Reputation: 12710

Lazy-Haskell programming is, in a way, the memoization paradigm taken to a extreme. Also, whatever you do in an imperative language is possible in Haskell, using either IO monad, the ST monad, monad transformers, arrows, or you name what.

The only problem is that these abstraction devices are much more complicated than the imperative equivalent that you mentioned, and they need a pretty deep mind-rewiring.

Upvotes: 4

Related Questions