Rojj
Rojj

Reputation: 1220

Ruby - Unpack array with mixed types

I am trying to use unpack to decode a binary file. The binary file has the following structure:

ABCDEF\tFFFABCDEF\tFFFF....

where

ABCDEF -> String of fixed length
\t -> tab character
FFF -> 3 Floats
.... -> repeat thousands of times

I know how to do it when types are all the same or with only numbers and fixed length arrays, but I am struggling in this situation. For example, if I had a list of floats I would do

s.unpack('F*')

Or if I had integers and floats like

[1, 3.4, 5.2, 4, 2.3, 7.8]

I would do

s.unpack('CF2CF2')

But in this case I am a bit lost. I was hoping to use a format string such `(CF2)*' with brackets, but it does not work.

I need to use Ruby 2.0.0-p247 if that matters

Example

ary = ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
s = ary.pack('P7fffP7fff')

then

s.scan(/.{19}/)
["\xA8lf\xF9\xD4\x7F\x00\x00\x9A\x99Y@33\xB3@\x9A\x99\x11", "A\x80lf\xF9\xD4\x7F\x00\x00\x00\x00 @ff\x0EAff"]

Finally

s.scan(/.{19}/).map{ |item| item.unpack('P7fff') }
Error: #<ArgumentError: no associated pointer>
<main>:in `unpack'
<main>:in `block in <main>'
<main>:in `map'
<main>:in `<main>'

Upvotes: 2

Views: 1056

Answers (2)

Eric Duminil
Eric Duminil

Reputation: 54253

You could read the file in small chunks of 19 bytes and use 'A7fff' to pack and unpack. Do not use pointers to structure ('p' and 'P'), as they need more than 19 bytes to encode your information. You could also use 'A6xfff' to ignore the 7th byte and get a string with 6 chars.

Here's an example, which is similar to the documentation of IO.read:

data = [["ABCDEF\t", 3.4, 5.6, 9.1], 
        ["FEDCBA\t", 2.5, 8.9, 3.1]]
binary_file = 'data.bin'
chunk_size = 19
pattern = 'A7fff'

File.open(binary_file, 'wb') do |o|
  data.each do |row|
    o.write row.pack(pattern)
  end
end

raise "Something went wrong. Please check data, pattern and chunk_size." unless File.size(binary_file) == data.length * chunk_size

File.open(binary_file, 'rb') do |f|
  while record = f.read(chunk_size)
    puts '%s %g %g %g' % record.unpack(pattern)
  end
end
# =>
#    ABCDEF   3.4 5.6 9.1
#    FEDCBA   2.5 8.9 3.1

You could use a multiple of 19 to speed up the process if your file is large.

Upvotes: 2

ForeverZer0
ForeverZer0

Reputation: 2496

When dealing with mixed formats that repeat, and are of a known fixed size, it is often easier to split the string first,

Quick example would be:

binary.scan(/.{LENGTH_OF_DATA}/).map { |item| item.unpack(FORMAT) }

Considering your above example, take the length of the string including the tab character (in bytes), plus the size of a 3 floats. If your strings are literally 'ABCDEF\t', you would use a size of 19 (7 for the string, 12 for the 3 floats).

Your final product would look like this:

str.scan(/.{19}/).map { |item| item.unpack('P7fff') }

Per example:

irb(main):001:0> ary = ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
=> ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]

irb(main):002:0> s = ary.pack('pfffpfff')
=> "\xE8Pd\xE4eU\x00\x00\x9A\x99Y@33\xB3@\x9A\x99\x11A\x98Pd\xE4eU\x00\x00\x00\x00 @ff\x0EAffF@"

irb(main):003:0> s.unpack('pfffpfff')
=> ["ABCDEF\t", 3.4000000953674316, 5.599999904632568, 9.100000381469727, "FEDCBA\t", 2.5, 8.899999618530273, 3.0999999046325684]

The minor differences in precision is unavoidable, but do not worry about it, as it comes from the difference of a 32-bit float and 64-bit double (what Ruby used internally), and the precision difference will be less than is significant for a 32-bit float.

Upvotes: 0

Related Questions