Kostya
Kostya

Reputation: 1102

Read 16 bit little-endian, then parse as a bitstring in erlang

I've inherited a binary file format with the following specification:

  |     F      | E | D | C | B | A | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0
0:| Status bit |        ------ 15 - bit unsigned integer -----------
1:| Status bit |        ----  uint:10  ----            | ---- uint:5 ---- 

Bit matching in Erlang is awesome. So I'd love to do something like this:

<<StatBit1:1, ValA:15/unsigned>> = <<2#1000000000101010:16>>.
<<StatBit2:1, ValB:10/unsigned, ValC:5/unsigned>> = <<2#0000001010100111:16>>.

The problem is that the file I need to process is saved in 8-bit-little-endian convention. So the very first 8-bits of the file in the example above would be 00101010 then 1000000 e.t.c.

{ok, S} = file:open("datafile", [read, binary, raw]).
{ok, <<Byte1:8, Byte2:8, Byte3:8, Byte4:8>>} = file:read(S,4).
io:format(
     " ~8.2.0B | ~8.2.0B | ~8.2.0B | ~8.2.0B ~n ", 
     [Byte1, Byte2, Byte3, Byte4]).

# 00101010 | 1000000 | 10100111 | 00000010
# ok

So I resort to reading and swapping the bytes:

<<StatBit1:1, ValA:15/unsigned>> = <<Byte2:8, Byte1:8>>.
<<StatBit2:1, ValB:10/unsigned, ValC:5/unsigned>> = <<Byte4:8, Byte3:8>>.

Alternatively I can read 16 bit little-endian and then "parse" it:

{ok, S} = file:open("datafile", [read, binary, raw]).
{ok, <<DW1:16/little, DW2:16/little>>} = file:read(S,4).
<<StatBit1:1, ValA:15/unsigned>> = <<DW1:16>>.
<<StatBit2:1, ValB:10/unsigned, ValC:5/unsigned>> = <<DW2:16>>.

Both solutions make me equally frustrated. I still suspect that there is a nice way of dealing with that type of situations. Is there?

Upvotes: 1

Views: 858

Answers (3)

Pascal
Pascal

Reputation: 14042

Did you try something like: [edit] make some correction, but I can't test this on my tab.

decode(<<A:8, 1:1, B:7>>) -> {status1, B*256+A};
decode(<<A:3, C:5, 0:1, B:7>>) -> {status2, B*8+A, C}.

Upvotes: 0

RichardC
RichardC

Reputation: 10557

As an explanation of why the binary syntax (as it is) can't solve your problem, consider that the bits in your file really is in order 7, ...0, F, E, ...8. The status bit is in F, but if you say "the next field is 15 bits long, and is a little-endian unsigned integer", you'll get bits 7,...0,F,E,...9 (the next 15 bits) which will then be interpreted as little-endian. You can't express the fact that you'd like to skip bit F and use E-8 instead, and then go back and pick up bit F for the status. If you could byte swap the file first, e.g. with "dd if=infile of=outfile conv=swab", you'd make your life a whole lot easier.

Upvotes: 1

Steve Vinoski
Steve Vinoski

Reputation: 20004

I'd first look into changing the application generating these files to write the data in network (big-endian) order. If that's not possible, then you're stuck with byte swapping like you're already doing. You could wrap the swapping into a function to keep it out of your decoding logic:

byteswap16(F) ->
    case file:read(F, 2) of
        {ok, <<B1:8,B2:8>>} -> {ok, <<B2:8,B1:8>>};
        Else -> Else
    end.

Alternatively, perhaps you could preprocess the file. You mentioned in your comment that the files are huge, so maybe this isn't practical for your case, but if each file fits comfortably in memory you could use file:read_file/1 to read the whole file and then preprocess the contents using a binary comprehension:

byteswap16(Filename) ->
    {ok,Bin} = file:read_file(Filename),
    << <<B2:8,B1:8>> || <<B1:8,B2:8>> <= Bin >>.

Both these solutions assume the entire file is written in 16-bit little endian format.

Upvotes: 1

Related Questions