cdk
cdk

Reputation: 6778

Get Arbitrary Slices of Bits from Bytestring

I want to use a lazy Bytestring to represent a stream of bits. I need to be able to take arbitrary slices of bits from this stream efficiently. For example, I might have a ByteString of length 10, and I'd like slice a new ByteString consisting of bits 24-36 from the original ByteString.

The problem is that ByteStrings are arrays of Word8s, so taking ranges that are not multiples of 8 is difficult. The best I've been able to come up with is this, using Data.Binary and Data.Binary.Bits. Note that get32BitRange is specifically for ranges <= 32.

get32BitRange :: Int -> Int -> ByteString -> ByteString
get32BitRange lo hi = runPut . putWord32be
                    . runGet (runBitGet . block $ word8 (8 - (lo `quot` 8)) *> word32be len)
                    . drop offset
    where len = hi - lo
          lo' = lo `div` 8
          offset = fromIntegral lo' - 1

The algorithm is:

It looks more than a little ugly, is there a more efficient way to grab arbitrary slices of bits from a ByteString?

EDIT: here is a more efficient version

get32BitRange :: Int -> Int -> ByteString -> Word32
get32BitRange lo hi = runGet get
    where get = runBitGet . block $ byteString byteOff *> word8 bitOff *> word32be len
          len = hi - lo
          (byteOff, bitOff) = lo `quotRem` 8

Upvotes: 2

Views: 928

Answers (3)

cdk
cdk

Reputation: 6778

I'm going to mark this as resolved. This is what I ended up using:

get32BitRange :: Int -> Int -> ByteString -> Word32
get32BitRange lo hi = assert (lo < hi) $
    runGet (runBitGet bitGet)
    where bitGet = block $ byteString byteOff
                         *> word8 bitOff
                         *> word32be len
          len = hi - lo
          (byteOff, bitOff) = lo `quotRem` 8

Upvotes: 1

sclv
sclv

Reputation: 38893

I think other solutions are way better but you can use the Internal module to get at the underlying structure: http://hackage.haskell.org/packages/archive/bytestring/0.10.2.0/doc/html/src/Data-ByteString-Internal.html#ByteString

data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
                     {-# UNPACK #-} !Int                -- offset
                     {-# UNPACK #-} !Int                -- length

Then you can use standard pointer tools to generate ByteStrings pointing exactly where you want, by manipulating the ForeignPtr directly...

Upvotes: 2

Ganesh Sittampalam
Ganesh Sittampalam

Reputation: 29100

You can't make this efficient with ByteString as your API type, because it doesn't carry the information that the "bits" you want really start at some offset into the first byte.

Best bet is to make a wrapper type:

data BitStream =
    BitStream {
        info :: ByteString,
        -- values from 0-7: ignore all bits in the first byte up to
        -- but not including this offset
        firstBitOffset :: !Int,to but not including this offset
        -- values from 0-7: ignore all bits in the last byte after
        -- but not including this offset
        lastBitOffset :: !Int
    }

Then you can design a bit-based API around this.

Upvotes: 1

Related Questions