Reputation: 165
import wave,struct
f = wave.open('bird.wav', 'r')
for i in range(5,10):
frame = f.readframes(i)
print frame
struct.unpack('<H',frame)
I use the above code to extract bytes from a stereo wav file in python. However, instead of bytes I get some gibberish characters. Using the struct.unpack()
function I get the following error
"unpack requires a string argument of length 2"
What changes do I make in the code to print those bytes in 1's and 0's? I want to later the modify the LSB of audio frames for steganography.
Upvotes: 2
Views: 3974
Reputation: 9796
If you want to modify the lsb of your bytes, there is no point in expressing the value to a binary string. Effectively, you would be doing something along the lines (in pseudocode):
byte = '\x6h'
binary = convert_to_bits(byte) # some way of getting 1s and 0s in a string
binary = binary[:7] + my_bit_string
byte = convert_to_byte(binary)
There are more direct and efficient ways to modify a bit value and that's with bitwise operators. For example, let's say we want to change 01001001 (decimal 73) to 01001000. We want to create a bitmask 11111110, which in decimal is the value 254, and AND
it with our value.
>>> value = 73 & 254
>>> value
72
>>> '{0:08b}'.format(value)
'01001000'
When you embed a bit to a byte, the lsb may change or it may not. There are many ways to go about it, but the most direct is to zero out the lsb and then overwrite it with your bit with an OR
(very versatile if you also want to embed in multiple bits).
byte = (byte & 254) | my_bit
You could also zero out the lsb with a right shift
, followed by a left shift
, but this takes 2 operations instead of one.
byte = ((byte >> 1) << 1) | my_bit
Or you could check whether the lsb and your bit are different and flip it with a XOR
. However, this method uses branches and is the least efficient.
if (byte & 1) != my_bit:
byte = byte ^ 1
# no need to do anything if they are the same
So, all you need to do is convert your bytes to an array of integers. You could use [ord(byte) for byte in frame]
, but there are more efficient built-in ways. With bytearray()
and bytes()
:
>>> frame = '\x0f\x02\x0e\x02\xf7\x00\xf7\x00T\xffT\xff'
>>> frame_bytes = bytearray(frame)
>>> frame_bytes[0]
15
>>> frame_bytes[0] = 14 # modify
>>> bytes(frame_bytes) # convert back to bytes
'\x0e\x02\x0e\x02\xf7\x00\xf7\x00T\xffT\xff'
With array.array()
(this seems to be a tiny wee bit slower for hundred thousands of bytes):
>>> import array
>>> frame = '\x0f\x02\x0e\x02\xf7\x00\xf7\x00T\xffT\xff'
>>> frame_bytes = array.array('B', frame)
>>> frame_bytes[0]
15
>>> frame_bytes[0] = 14 # modify
>>> frame_bytes.tostring() # convert back to bytes; in Python 3 use `tobytes()`
'\x0e\x02\x0e\x02\xf7\x00\xf7\x00T\xffT\xff'
Example of embedding and extracting.
frame = '\x0f\x02\x0e\x02\xf7\xf7T\xffT\xff'
bits = [0, 0, 1, 1, 0]
# Embedding
frame_bytes = bytearray(frame)
for i, bit in enumerate(bits):
frame_bytes[i] = (frame_bytes[i] & 254) | bit
frame_modified = bytes(frame_bytes)
# Extraction
frame_bytes = bytearray(frame_modified)
extracted = [frame_bytes[i] & 1 for i in range(5)]
assert bits == extracted
If your secret is a string or series of bytes, it's easy to convert them to a list of 1s and 0s.
Finally, make sure you don't modify any header data, as that may make the file unreadable.
Upvotes: 1
Reputation: 55479
I'm not sure why you want to print those bytes in binary, but it's easy enough to do so.
You need to convert the bytes to integers, and then format them using the str.format
method, the old %
-style formatting doesn't do bits.
The simple way to do that conversion is using the ord
function, but for large numbers of bytes it's better to convert them in one hit by creating a bytearray
.
#Some bytes, using hexadecimal escape codes
s = '\x01\x07\x0f\x35\xad\xff'
print ' '.join(['{0:08b}'.format(ord(c)) for c in s])
b = bytearray(s)
print ' '.join(['{0:08b}'.format(u) for u in b])
output
00000001 00000111 00001111 00110101 10101101 11111111
00000001 00000111 00001111 00110101 10101101 11111111
Generally, hexadecimal notation is more convenient to read than binary.
from binascii import hexlify
print hexlify(s)
print ' '.join(['%02X' % u for u in b])
print ' '.join(['%02X' % ord(c) for c in s])
print ' '.join(['{0:02X}'.format(ord(c)) for c in s])
output
01070f35adff
01 07 0F 35 AD FF
01 07 0F 35 AD FF
01 07 0F 35 AD FF
I just saw your comment re steganography. The most convenient way to twiddle the bits of your bytes is to use bytearray
. You can easily convert a bytearray
back to a string of bytes using the str
function.
print hexlify(str(b))
output
01070f35adff
The string formatting options are described in the official Python docs. For the old %
-style formatting, see 5.6.2. String Formatting Operations. For the modern str.format
options see 7.1.3. Format String Syntax and 7.1.3.1. Format Specification Mini-Language.
In {0:08b}
the 0
before the colon is the field position (which can be omitted in recent versions of Python). It says that we want to apply this formatting code to the first argument of .format
, i.e., the argument with index zero. Eg,
'{0} {2} {1}'.format('one', 'two', 'three')
prints
one three two
The b
means we want to print a number as binary. The 08
means we want the output to be 8 characters wide, with zero padding for binary numbers that are smaller than 8 bits.
In %02X
the uppercase X
means we want to print a number as hexadecimal, using uppercase letters A-F for digits greater than 9, we can use lowercase x
to get lowercase letters. The 02
means we want the the output to be 2 characters wide, with zero padding for hexadecimal numbers that are smaller than 2 hex digits.
Upvotes: 1