q9f
q9f

Reputation: 11834

Ruby pack and unpack hex value does not return the same value?

I have a hex string of unknown (variable) length and I want to pack or unpack at any time to convert to bytes.

["a"].pack("H*")
# => "\xA0"

I'm getting \xA0 -- is that because it is little-endian? I was expecting \xA or \x0A.

In the same manner I'm also getting a0 hex string if unpacking again, i.e.,:

["a"].pack("H*").unpack("H*").first
=> "a0"

Again, I was expecting a or 0a, so I'm a bit confused. Is this all the same?

I would prefer big-endian for hex strings but it appears that .pack does not accept endianness for H:

["a"].pack("H>*").unpack("H>*")
ArgumentError: '<' allowed only after types sSiIlLqQjJ (ArgumentError)
from <internal:pack>:8:in `pack'

How can I get a big-endian hex values from unpack?

Upvotes: 3

Views: 129

Answers (2)

user513951
user513951

Reputation: 13715

Let's collect some facts.


First, from How to identify the endianness of given hex string? :

"Bytes don't have endianness." – @MichaelPetch

– @VC.One

You only get endianness once you start stringing bytes together. So that's not at issue here.


Next, from What does ["string"].pack('H*') mean? :

[Array.pack] interprets the string as hex numbers, two characters per byte, and converts it to a string with the characters with the corresponding ASCII code.

So your string "a", being one character, doesn't describe even one full byte.


Finally, from the packed_data docs :

['fff'].pack('h3') # => "\xFF\x0F"
['fff'].pack('h4') # => "\xFF\x0F"
['fff'].pack('h5') # => "\xFF\x0F\x00"
['fff'].pack('h6') # => "\xFF\x0F\x00"

This shows that input strings that are

  1. an odd number of characters, or
  2. shorter than the length specified in the pattern,

are treated as though they were right-padded with 0.


Putting all this together, it becomes clear that what's happening is that Array.pack is, in effect, padding your too-short input string "a" with a 0 on the right so that it can work with it at all, and everything treats the input as the string "a0" from there.

If you're not satisfied with that behavior, the one lever you can pull is to swap H* for h*, which according to the docs trades "high nibble first" for "low nibble first."

Here's an illustration of the effects of that change. (I'll use f instead of a, because \x0A gets rewritten as \n, making the effect harder to see.)

# Determines how order of nibbles ("half-bytes") is interpreted
["f0"].pack("H*") # => "\xF0"
["f0"].pack("h*") # => "\x0F"
["0f"].pack("H*") # => "\x0F"
["0f"].pack("h*") # => "\xF0"

# Always right-pads input ("f" matches behavior of "f0…", never "…0f")
["f"].pack("H*") # => "\xF0"
["f"].pack("h*") # => "\x0F"
["f"].pack("H4") # => "\xF0\x00"
["f"].pack("h4") # => "\x0F\x00"

# Changes nothing in the round-trip conversion
["f0"].pack("H*").unpack("H*") # => ["f0"]
["f0"].pack("h*").unpack("h*") # => ["f0"]
["0f"].pack("H*").unpack("H*") # => ["0f"]
["0f"].pack("h*").unpack("h*") # => ["0f"]

It seems like this nibble-ordering is what you had in mind when you asked about endianness, so I hope this helps. However, note that whichever nibble order you choose, a 1-character input string will always be right-padded with a zero, never left-padded.

Upvotes: 4

Casper
Casper

Reputation: 34328

Endianess is not a concept that is relevant to bytes, but when converting hex to bytes the Ruby pack method instead talks about nibbles. A nibble is the 4 bit part of an 8 bit byte. So in one byte there are two nibbles.

You have H which packs high nibble first, which is the normal way you'd expect hex to be packed into bytes, and then you have h which packs low nibble first.

What you're looking for is H, high nibble first (what you refer to as big-endan), the only thing you're missing is that pack will interpret odd-length hex strings with the high nibble part packed first.

Therefore "a" will always be interpreted as 0xA0 by pack, when what you want is 0x0A.

In order to fix this problem all you need to do is pad odd length hex strings with a 0 in the beginning, and you should get the results you want.

def hexpack(str)
  [(str.length.odd? ? "0" + str : str)].pack("H*")
end

hexpack("a").unpack("H*")
=> ["0a"]
hexpack("ab").unpack("H*")
=> ["ab"]
hexpack("abc").unpack("H*")
=> ["0abc"]
hexpack("10203").bytes
=> [1, 2, 3]

Upvotes: 3

Related Questions