Reputation: 1220
I am trying to use unpack
to decode a binary file. The binary file has the following structure:
ABCDEF\tFFFABCDEF\tFFFF....
where
ABCDEF -> String of fixed length
\t -> tab character
FFF -> 3 Floats
.... -> repeat thousands of times
I know how to do it when types are all the same or with only numbers and fixed length arrays, but I am struggling in this situation. For example, if I had a list of floats I would do
s.unpack('F*')
Or if I had integers and floats like
[1, 3.4, 5.2, 4, 2.3, 7.8]
I would do
s.unpack('CF2CF2')
But in this case I am a bit lost. I was hoping to use a format string such `(CF2)*' with brackets, but it does not work.
I need to use Ruby 2.0.0-p247 if that matters
Example
ary = ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
s = ary.pack('P7fffP7fff')
then
s.scan(/.{19}/)
["\xA8lf\xF9\xD4\x7F\x00\x00\x9A\x99Y@33\xB3@\x9A\x99\x11", "A\x80lf\xF9\xD4\x7F\x00\x00\x00\x00 @ff\x0EAff"]
Finally
s.scan(/.{19}/).map{ |item| item.unpack('P7fff') }
Error: #<ArgumentError: no associated pointer>
<main>:in `unpack'
<main>:in `block in <main>'
<main>:in `map'
<main>:in `<main>'
Upvotes: 2
Views: 1056
Reputation: 54253
You could read the file in small chunks of 19 bytes and use 'A7fff'
to pack and unpack. Do not use pointers to structure ('p'
and 'P'
), as they need more than 19 bytes to encode your information.
You could also use 'A6xfff'
to ignore the 7th byte and get a string with 6 chars.
Here's an example, which is similar to the documentation of IO.read
:
data = [["ABCDEF\t", 3.4, 5.6, 9.1],
["FEDCBA\t", 2.5, 8.9, 3.1]]
binary_file = 'data.bin'
chunk_size = 19
pattern = 'A7fff'
File.open(binary_file, 'wb') do |o|
data.each do |row|
o.write row.pack(pattern)
end
end
raise "Something went wrong. Please check data, pattern and chunk_size." unless File.size(binary_file) == data.length * chunk_size
File.open(binary_file, 'rb') do |f|
while record = f.read(chunk_size)
puts '%s %g %g %g' % record.unpack(pattern)
end
end
# =>
# ABCDEF 3.4 5.6 9.1
# FEDCBA 2.5 8.9 3.1
You could use a multiple of 19
to speed up the process if your file is large.
Upvotes: 2
Reputation: 2496
When dealing with mixed formats that repeat, and are of a known fixed size, it is often easier to split the string first,
Quick example would be:
binary.scan(/.{LENGTH_OF_DATA}/).map { |item| item.unpack(FORMAT) }
Considering your above example, take the length of the string including the tab character (in bytes), plus the size of a 3 floats. If your strings are literally 'ABCDEF\t'
, you would use a size of 19 (7 for the string, 12 for the 3 floats).
Your final product would look like this:
str.scan(/.{19}/).map { |item| item.unpack('P7fff') }
Per example:
irb(main):001:0> ary = ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
=> ["ABCDEF\t", 3.4, 5.6, 9.1, "FEDCBA\t", 2.5, 8.9, 3.1]
irb(main):002:0> s = ary.pack('pfffpfff')
=> "\xE8Pd\xE4eU\x00\x00\x9A\x99Y@33\xB3@\x9A\x99\x11A\x98Pd\xE4eU\x00\x00\x00\x00 @ff\x0EAffF@"
irb(main):003:0> s.unpack('pfffpfff')
=> ["ABCDEF\t", 3.4000000953674316, 5.599999904632568, 9.100000381469727, "FEDCBA\t", 2.5, 8.899999618530273, 3.0999999046325684]
The minor differences in precision is unavoidable, but do not worry about it, as it comes from the difference of a 32-bit float and 64-bit double (what Ruby used internally), and the precision difference will be less than is significant for a 32-bit float.
Upvotes: 0