Reputation: 7489
I have four integers {a
, b
, c
, d
} that can have the following range of values:
a
- {0 or 1} (1 bit)
b
- {0 or 1} (1 bit)
c
- {0, 1, 2, ..., 7} (3 bits)
d
- {0, 1, 2, ..., 7} (3 bits)
at first, I would like to pack them into a one byte that can be then written to a binary file.
later, I would like to unpack that one byte and get from it to a tuple of the form (a
, b
, c
, d
).
I know how to read/write a byte to a binary file in python. But how do I do the bits packing/unpacking?
Upvotes: 17
Views: 3031
Reputation: 2915
nested arithmetic form (NAF) :
p = 8 * (8 * (2 * a + b) + c) + d # "a" at MSB
p = a + (b + (c + d * 8) * 2) * 2 # "a" at LSB
nested shifting form (NSF) :
p = ( ( ( ( (a << 1) + b) << 3) + c) << 3) + d # "a" at MSB
p = ( ( ( ( (d << 3) + c) << 1) + b) << 1) + a # "a" at LSB
Personally I think NAF is much cleaner (even if not the fastest), because it keeps all the digit groups on the same side and all the scaling factors on the other regardless of endian-ness, all while minimizing the layers of nesting required, and total arithmetic ops.
"NAF" and "NSF" aren't formal terms - it's just a colloquial way of describing them
So packing it would be chr(p)
or just sprintf("%c", p)
as for decoding, using the "a" at MSB approach, it's basically splitting out a byte's octal codes back out to its components ("a" at LSB is the same decoding process in little-endian) :
2a+b c d a b
\ \/ \/
======================
62 \0 76 00
63 \0 77 00
64 \1 00 01
65 \1 01 01
126 \1 76 01
127 \1 77 01
128 \2 00 10
129 \2 01 10
190 \2 76 10
191 \2 77 10
192 \3 00 11
193 \3 01 11
254 \3 76 11
255 \3 77 11
0 \0 00 00
1 \0 01 00
2a+b
is really useful when it comes to encoding UTF-8
:
\0 ##
- all theASCII
digits and arithmetic operators, most of theASCII
linguistic punctuation symbols, and nearly all theASCII
[[:cntrl:]]
bytes (sans\177 DEL
)
\1 ##
- all theASCII
letters, plus misc[[:punct:]]
\2 ##
- trailing/"continuation" bytes forUnicode
code pointsU+0080 - U+10FFFFF
\3 ##
- all the leading byte forUnicode
code pointsU+0080 - U+10FFFFF
— sans \300 xC0
- \301 xC1
and \365 xF5
- \377 xFF
Upvotes: 0
Reputation: 602715
Use shift and bitwise OR, then convert to a character to get a "byte":
x = chr(a | (b << 1) | (c << 2) | (d << 5))
To unpack this byte again, first convert to an integer, then shift and use bitwise AND:
i = ord(x)
a = i & 1
b = (i >> 1) & 1
c = (i >> 2) & 7
d = (i >> 5) & 7
Explanation: Initially, you have
0000000a
0000000b
00000ccc
00000ddd
The left-shifts give you
0000000a
000000b0
000ccc00
ddd00000
The bitwise OR results in
dddcccba
Converting to a character will convert this to a single byte.
Unpacking: The four different right-shifts result in
dddcccba
0dddcccb
00dddccc
00000ddd
Masking (bitwise AND) with 1
(0b00000001
) or 7
(0b00000111
) results in
0000000a
0000000b
00000ccc
00000ddd
again.
Upvotes: 32
Reputation: 21925
If you need to this kind of thing a lot then bit shifting can become tedious and error prone. There are third-party libraries that can help - I wrote one called bitstring:
To pack and convert to a byte:
x = bitstring.pack('2*uint:1, 2*uint:3', a, b, c, d).bytes
and to unpack:
a, b, c, d = bitstring.BitArray(bytes=x).unpack('2*uint:1, 2*uint:3')
This is probably overkill for your example, but it's helpful when things get more complicated.
Upvotes: 5
Reputation: 9895
def encode(a, b, c, d):
return a | b << 1 | c << 2 | d << 5
def decode(x):
return x & 1, (x >> 1) & 1, (x >> 2) & 7, (x >> 5) & 7
Upvotes: 10
Reputation: 501
Pretty simple. Mask (for range), shift them into place, and or them together.
packed = ((a & 1) << 7) | ((b & 1) << 6) | ((c & 7) << 3) | (d & 7)
a = (packed >> 7) & 1
b = (packed >> 6) & 1
c = (packed >> 3) & 7
d = packed & 7
Upvotes: 4