Reputation: 21
I'm now developing binary parsing program using Haskell. I currently found out that strict/lazy both BitGet seems to be very slow and surprisingly allocate a lot of memory.
I tested below code (built with -O2), such as parsing entire bits in the input file, and figure out the profiling result. For this example, I used the 1,819,173 bytes binary file.
Strict version:
import Prelude as P
import System.Environment (getArgs)
import Data.ByteString as B
import Data.Binary.Strict.BitGet
coreFunc :: Int -> BitGet Int
coreFunc len = f len 0
where
f 0 r = return r
f l _ = do
b <- getBit
f (l - 1) $ if b then 1 else 0
mainFunc :: B.ByteString -> IO ()
mainFunc bs =
case runBitGet bs (coreFunc ((B.length bs) * 8)) of
Left emsg -> error emsg
Right r -> print $ show r
main :: IO ()
main = do
args <- getArgs
case args of
[] -> return ()
(x:_) -> (do
bs <- B.readFile x
mainFunc bs
return ()
)
-- profiling result --
total time = 1.74 secs (1741 ticks @ 1000 us, 1 processor)
total alloc = 7,948,043,192 bytes (excludes profiling overheads)
Lazy version:
import Prelude as P
import System.Environment (getArgs)
import Data.ByteString.Lazy as B
import Data.Binary.Bits.Get
import Data.Binary.Get
import Data.Int (Int64)
coreFunc :: Int64 -> BitGet Int
coreFunc len = f len 0
where
f 0 r = return r
f l _ = do
b <- getBool
f (l - 1) $ if b then 1 else 0
mainFunc :: B.ByteString -> IO ()
mainFunc bs = do
let r = runGet (runBitGet (coreFunc ((B.length bs) * 8))) bs
print $ show r
main :: IO ()
main = do
args <- getArgs
case args of
[] -> return ()
(x:_) -> (do
bs <- B.readFile x
mainFunc bs
return ()
)
-- profiling result --
total time = 2.21 secs (2207 ticks @ 1000 us, 1 processor)
total alloc = 6,405,531,680 bytes (excludes profiling overheads)
I want to ask that:
Upvotes: 2
Views: 678
Reputation: 6778
It seems like your coreFunc
is supposed to skip forward some (len - 1)
number of bits, then read a single bit as an 0
or 1
and return it in the BitGet
monad. If that's the intent, something like this will be much more efficient.
I'm using the binary-bits
package:
import Control.Applicative
import Data.Binary.Get
coreFunc :: Int -> Get Int
coreFunc len =
fromEnum <$> runBitGet (block (skip (len - 1) *> bool)
skip :: Int -> BitGet ()
skip n = byteString bytes *> word64be bits *> pure ()
where (bytes, bits) = quotRem n 8 -- sizeOf Word8
Unfortunately the package does not have a skip
function to let us skip n
bits, which the binary
package it's based off includes, so I've had to write my own. It's possible a more efficient version could be written with access to the Block
internals, but the library might already be optimizing it well enough that theres no benefit.
I'd like to benchmark this version against yours to get an accurate comparison, can you provide the binary file you use for testing?
Upvotes: 2