Robert Bradley
Robert Bradley

Reputation: 23

Decoding ByteString Using Encoding

I'm building a script that reads 381 bytes from a file and attempts to decode the input. I am interested in 348 of those bytes I am labelling "presets". 3 byte chunks of the presets ByteString can be decoded into a single Int16, and "values" below are the 116 Int16 I am interested in...

decodeFile :: FilePath -> IO [Maybe PresetValue]
decodeFile filename =
  do h <- openFile (dir ++ filename) ReadMode
     header  <- h `BL.hGet` 32
     presets <- h `BL.hGet` 348
     f7      <- h `BL.hGet` 1
     let values = Bin.runGet getPresets presets
     hClose h
     return values

getPresets = do
  empty <- Bin.isEmpty
  if empty
    then return []
    else do p  <- getAndDecodeTriple
            ps <- getPresets
            return (p:ps)

getAndDecodeTriple = do
  b1 <- Bin.getWord8
  b2 <- Bin.getWord8
  b3 <- Bin.getWord8
  return $ decode (b1,b2,b3)

The problem I am having is decoding a 3 byte chunk, given I know how it was encoded in C++

Here is the C++ encoding

void SysexReader::sx_encode(int val, char* dest)
{
    char encode;
    
    // Encode Byte 1 (4 bits of payload)
    encode = 0x40 | ((val >> 12) & 0x000F);
    *dest++ = encode;
    
    // Encode Byte 2 (6 bits of payload)
    encode = (val >> 6) & 0x003F;
    *dest++ = encode;
    
    // Encode Byte 3 (6 bits of payload)
    encode = val & 0x003F;
    *dest = encode;
}

Here is the C++ encoding translated to Haskell...

type Encoding a  = (a,a,a)
type PresetValue = Int16

encode :: Integral a => PresetValue -> Encoding a
encode val =
  let f = fromIntegral
  in (f $ enc1 val, f $ enc2 val, f $ enc3 val)
  where
    enc1 = or40 . and000F . (flip shiftR 12)
      where and000F = (0x000F .&.)
            or40    = (0x40 .|.)
    enc2 = enc3 . flip shiftR 6
    enc3 = (0x003F .&.)

My attempt at decoding uses the fact that I have the encoding procedure and I know that PresetValue can only be in the range of (0,127)

--    (3 Sysex Bytes) -> (Preset Value)   --
-------------------------------------------------------
decode :: Integral a => (a,a,a) -> Maybe PresetValue
decode encoded =
  case match of
    [value] -> Just value
    []      -> Nothing  --error "encode not surjective"
    many    -> error "encode not injective"
  where
    match = filter (\x -> encode x == encoded) [0..127]

Unfortunately I can't decode all values, as you can see from the 116-entry list below containing Nothing in many places.

[Just 14,Just 84,Just 97,Just 117,Just 114,Just 117,Just 115,Just 32,Just 73,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Nothing,Nothing,Just 0,Nothing,Nothing,Nothing,Just 0,Nothing,Nothing,Just 0,Just 0,Nothing,Nothing,Just 0,Just 1,Nothing,Just 0,Nothing,Nothing,Just 0,Just 0,Just 0,Just 1,
Just 0,Just 0,Nothing,Just 5,Just 0,Just 1,Just 0,Just 0,Just 0,Nothing,Nothing,
Just 3,Just 2,Just 0,Just 0,Nothing,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Nothing,Nothing,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Just 0,Nothing]

What am I doing wrong? I feel like it must be the types I am using to represent each chunk from the incoming file. Or maybe I'm losing information using fromIntegral.

I've been a developer for a while and have never posted a question on here and always fought through for an answer, but I'm really lost on this one. Thanks.

Upvotes: 1

Views: 239

Answers (1)

K. A. Buhr
K. A. Buhr

Reputation: 50819

It might be better to use openBinaryFile in place of openFile. This shouldn't make a difference here, since I believe hGet ignores whether files have been open in text or binary mode, but it's good practice.

Also, it would also be better to use a Word16 in place of your Int16. The C code is using an int, so any 16-bit integer value is going to be unsigned. Again, if you really are only dealing with presets in the range [0..127] it shouldn't matter, but it seems like good practice.

There's nothing obviously wrong with your code that I can see, but it's pretty much impossible to duplicate your problem without access to the input file. I might suggest using a better implementation of decode:

decode :: (Word8, Word8, Word8) -> Maybe PresetValue
decode (a,b,c)
  |  0x40 <= a && a <= 0x4f
  && b <= 0x3f && c <= 0x3f
  = Just $ (fromIntegral a .&. 0xf) `shiftL` 12 .|. fromIntegral b `shiftL` 6 .|. fromIntegral c
decode _ = Nothing

which handles all possible encoded preset values from 0 to 65535. If you still get Nothing values in your decode, then the encoded file is probably corrupt.

It looks like the first bad value is at offset 19, corresponding to bytes 57-59 (0x39-0x41), or accounting for the 32-byte header, bytes 89-91 (0x59-0x61). It might be helpful to open the file in a hex editor and see what three bytes are at that offset that are giving you trouble.

Upvotes: 1

Related Questions