Reputation: 4233
I want to process a couple of hundred binary data chunks ("scenarios") for a Monte Carlo simulation. Each scenario consists of 1 million floats. Here's how I create a dummy binary file for the scenario data:
import Data.Binary
import qualified Data.ByteString.Lazy as B
import Data.Array.Unboxed
scenSize = 1000000
scens = 100
main = do
let xs = array (1,scenSize) [(i, 0.0) | i <- [1..scenSize]] :: UArray Int Float
let l = take scens $ Prelude.repeat xs
B.writeFile "bintest.data" (encode l)
return ()
This works fine. Now I want to process the scenarios. Since there can be really a lot of scenarios (scens=1000 or so), the processing should be done lazily one chunk at a time. I tried decodeFile
, but this does not seem to work:
import Data.Binary
import qualified Data.Array.IArray as IA
import Data.Array.Unboxed as A
main = do
bs <- decodeFile "bintest.data" :: IO [UArray Int Float]
mapM_ doStuff bs
return ()
doStuff b =
Prelude.putStrLn $ show $ b IA.! 100000
This program seems to first load all the data in memory, and then prints all the numbers at the end of the run. It also uses a lot of memory and crashes for scens=500 on my 32-bit Ubuntu machine.
What am I doing wrong? Is there an easy way to make the program run lazily?
Upvotes: 2
Views: 376
Reputation: 64740
decodeFile
is not lazy, just look at the source -it calls decodeOrFail
, which itself must parse the whole file to determine success or failure.
EDIT:
So what I believe worked in the original binary
is now broken (read: it's now a non-lazy memory hog). One solution that I doubt is optimally pretty is to use lazy readFile
and runGetIncremental
then manually push chunks into the decoder:
import Data.Binary
import Data.Binary.Get
import Data.ByteString.Lazy as L
import Data.ByteString as B
import qualified Data.Array.IArray as IA
import Data.Array.Unboxed as A
main = do
bs <- getListLazy `fmap` L.readFile "bintest2.data"
mapM_ doStuff bs
return ()
doStuff b = print $ b IA.! 100000
The important stuff is here:
getListLazy :: L.ByteString -> [UArray Int Float]
getListLazy lz = go decodeUArray (L.toChunks lz)
where
go :: Decoder (UArray Int Float) -> [B.ByteString] -> [UArray Int Float]
go _ [] = []
go dec (b:bs) =
case pushChunk dec b of
Done b' o a -> a : go decodeUArray (b' : bs)
Partial f -> case bs of
(x:xs) -> go (f $ Just x) xs
[] -> []
Fail _ _ s -> error s -- alternatively use '[]'
decodeUArray :: Decoder (UArray Int Float)
decodeUArray = runGetIncremental get
Notice this solution didn't bother decoding then plumbing the list length through the decoder - I just changed up your generator code to output numerous arrays and not a list of arrays.
To avoid code like this I think pipes would be the way to go.
Upvotes: 4