Reputation: 52067
In an object-oriented language when I need to cache/memoize the results of a function for a known life-time I'll generally follow this pattern:
This object-based approach is very similar to the function-based memoization pattern described here: http://www.bardiak.com/2012/01/javascript-memoization-pattern.html
The main benefit of this approach is that the results are kept around only for the life time of the cache object. A common use case is in the processing of a list of work items. For each work item one creates the cache object for that item, processes the work item with that cache object then discards the work item and cache object before proceeding to the next work item.
What are good ways to implement short-lived memoization in Haskell? And does the answer depend on if the functions to be cached are pure or involve IO?
Just to reiterate - it would be nice to see solutions for functions which involve IO.
Upvotes: 9
Views: 1047
Reputation: 4261
I believe the above answers are both more complex than necessary, although they might be more portable than what I'm about to describe.
As I understand it, there is a rule in ghc
that each value is computed exactly once when it's enclosing lambda expression is entered. You may thus create exactly your short lived memoization object as follows.
import qualified Data.Vector as V
indexerVector :: (t -> Int) -> V.Vector t -> Int -> [t]
indexerVector idx vec = \e -> tbl ! e
where m = maximum $ map idx $ V.toList vec
tbl = V.accumulate (flip (:)) (V.replicate m [])
(V.map (\v -> (idx v, v)) vec)
What does this do? It groups all the elements in the Data.Vector t
passed as it's second argument vec
according to integer computed by it's first argument idx
, retaining their grouping as a Data.Vector [t]
. It returns a function of type Int -> [t]
which looks up this grouping by this pre-computed index value.
Our compiler ghc
has promised that tbl
shall only be thunked once when we invoke indexerVector
. We may therefore assign the lambda expression \e -> tbl ! e
returned by indexVector
to another value, which we may use repeatedly without fear that tbl
ever gets recomputed. You may verify this by inserting a trace
on tbl
.
In short, your caching object is exactly this lambda expression.
I've found that almost anything you can accomplish with a short term object can be better accomplished by returning a lambda expression like this.
Upvotes: 2
Reputation: 336
You can use very same pattern in haskell too. Lazy evaluation will take care of checking whether value is evaluated already. It has been mentioned mupltiple times already but code example could be useful. In example below memoedValue
will calculated only once when it is demanded.
data Memoed = Memoed
{ value :: Int
, memoedValue :: Int
}
memo :: Int -> Memoed
memo i = Memoed
{ value = i
, memoedValue = expensiveComputation i
}
Even better you can memoize values which depend on other memoized values. You shoud avoid dependecy loops. They can lead to nontermination
data Memoed = Memoed
{ value :: Int
, memoedValue1 :: Int
, memoedValue2 :: Int
}
memo :: Int -> Memoed
memo i = r
where
r = Memoed
{ value = i
, memoedValue1 = expensiveComputation i
, memoedValue2 = anotherComputation (memoedValue1 r)
}
Upvotes: 1
Reputation: 53725
Let's use Luke Palmer's memoization library: Data.MemoCombinators
import qualified Data.MemoCombinators as Memo
import Data.Function (fix) -- we'll need this too
I'm going to define things slightly different from how his library does, but it's basically the same (and furthermore, compatible). A "memoizable" thing takes itself as input, and produces the "real" thing.
type Memoizable a = a -> a
A "memoizer" takes a function and produces the memoized version of it.
type Memoizer a b = (a -> b) -> a -> b
Let's write a little function to put these two things together. Given a Memoizable
function and a Memoizer
, we want the resultant memoized function.
runMemo :: Memoizer a b -> Memoizable (a -> b) -> a -> b
runMemo memo f = fix (f . memo)
This is a little magic using the fixpoint combinator (fix
). Never mind that; you can google it if you are interested.
So let's write a Memoizable
version of the classic fib example:
fib :: Memoizable (Integer -> Integer)
fib self = go
where go 0 = 1
go 1 = 1
go n = self (n-1) + self (n-2)
Using a self
convention makes the code straightforward. Remember, self
is what we expect to be the memoized version of this very function, so recursive calls should be on self
. Now fire up ghci.
ghci> let fib' = runMemo Memo.integral fib
ghci> fib' 10000
WALL OF NUMBERS CRANKED OUT RIDICULOUSLY FAST
Now, the cool thing about runMemo
is you can create more than one freshly memoized version of the same function, and they will not share memory banks. That means that I can write a function that locally creates and uses fib'
, but then as soon as fib'
falls out of scope (or earlier, depending on the intelligence of the compiler), it can be garbage collected. It doesn't have to be memoized at the top level. This may or may not play nicely with memoization techniques that rely on unsafePerformIO
. Data.MemoCombinators
uses a pure, lazy Trie, which fits perfectly with runMemo
. Rather than creating an object which essentially becomes a memoization manager, you can simply create memoized functions on demand. The catch is that if your function is recursive, it must be written as Memoizable
. The good news is you can plug in any Memoizer
that you wish. You could even use:
noMemo :: Memoizer a b
noMemo f = f
ghci> let fib' = runMemo noMemo fib
ghci> fib' 30 -- wait a while; it's computing stupidly
1346269
Upvotes: 12
Reputation: 12710
Lazy-Haskell programming is, in a way, the memoization paradigm taken to a extreme. Also, whatever you do in an imperative language is possible in Haskell, using either IO monad, the ST monad, monad transformers, arrows, or you name what.
The only problem is that these abstraction devices are much more complicated than the imperative equivalent that you mentioned, and they need a pretty deep mind-rewiring.
Upvotes: 4