Reputation: 910
I get UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 0: invalid continuation byte
When I try to call codecs.decode(X, 'utf-8')
where X = b'\xe8\xd0\xca@\xee\xe4\xca\xc6\xd6@\xde\xcc@\xe8\xd0\xca@\xd0\xca\xe6\xe0\xca\xe4\xea\xe6\x14\xc4\xf2@\xd0\xca\xdc\xe4\xf2@\xee\xc2\xc8\xe6\xee\xde\xe4\xe8\xd0@\xd8\xde\xdc\xce\xcc\xca\xd8\xd8\xde\xee\x14\x14\xd2\xe8@\xee\xc2\xe6@\xe8\xd0\xca@\xe6\xc6\xd0\xde\xde\xdc\xca\xe4@\xd0\xca\xe6\xe0\xca\xe4\xea\xe6\x14@@@@@@\xe8\xd0\xc2\xe8@\xe6\xc2\xd2\xd8\xca\xc8@\xe8\xd0\xca@\xee\xd2\xdc\xe8\xe4\xf2@\xe6\xca\xc2\x14\xc2\xdc\xc8@\xe8\xd0\xca@\xe6\xd6\xd2\xe0\xe0\xca\xe4@\xd0\xc2\xc8@\xe8\xc2\xd6\xca\xdc@\xd0\xd2\xe6@\xd8\xd2\xe8\xe8\xd8\xca@\xc8\xc2\xea\xce\xd0\xe8\xca\xe4\x14@@@@@@\xe8\xde@\xc4\xca\xc2\xe4@\xd0\xd2\xda@\xc6\xde\xda\xe0\xc2\xdc\xf2\\\x14\x14\xc4\xd8\xea\xca@\xee\xca\xe4\xca@\xd0\xca\xe4@\xca\xf2\xca\xe6@\xc2\xe6@\xe8\xd0\xca@\xcc\xc2\xd2\xe4\xf2Z\xcc\xd8\xc2\xf0\x14@@@@@@\xd0\xca\xe4@\xc6\xd0\xca\xca\xd6\xe6@\xd8\xd2\xd6\xca@\xe8\xd0\xca@\xc8\xc2\xee\xdc@\xde\xcc@\xc8\xc2\xf2\x14\xc2\xdc\xc8@\xd0\xca\xe4@\xc4\xde\xe6\xde\xda@\xee\xd0\xd2\xe8\xca@\xc2\xe6@\xe8\xd0\xca@\xd0\xc2\xee\xe8\xd0\xde\xe4\xdc@\xc4\xea\xc8\xe6\x14@@@@@@\xe8\xd0\xc2\xe8@\xde\xe0\xca@\xd2\xdc@\xe8\xd0\xca@\xda\xde\xdc\xe8\xd0@\xde\xcc@\xda\xc2\xf2\\\x14\x14\xe8\xd0\xca@\xe6\xd6\xd2\xe0\xe0\xca\xe4@\xd0\xca@\xe6\xe8\xde\xde\xc8@\xc4\xca\xe6\xd2\xc8\xca@\xe8\xd0\xca@\xd0\xca\xd8\xda\x14@@@@@@\xd0\xd2\xe6@\xe0\xd2\xe0\xca@\xee\xc2\xe6@\xd2\xdc@\xd0\xd2\xe6@\xda\xde\xea\xe8\xd0\x14\xc2\xdc\xc8@\xd0\xca@\xee\xc2\xe8\xc6\xd0\xca\xc8@\xd0\xde\xee@\xe8\xd0\xca@\xec\xca\xca\xe4\xd2\xdc\xce@\xcc\xd8\xc2\xee@\xc8\xd2\xc8@\xc4\xd8\xde\xee\x14@@@@@@\xe8\xd0\xca@\xe6\xda\xde\xd6\xca@\xdc\xde\xee@\xee\xca\xe6\xe8@\xdc\xde\xee@\xe6\xde\xea\xe8\xd0\\\x14\x14\xe8\xd0\xca\xdc@\xea\xe0@\xc2\xdc\xc8@\xe6\xe0\xc2\xd6\xca@\xc2\xdc@\xde\xd8\xc8@\xe6\xc2\xd2\xd8\xde\xe4\x14@@@@@@\xd0\xc2\xc8@\xe6\xc2\xd2\xd8\xca\xc8@\xe8\xde@\xe8\xd0\xca@\xe6\xe0\xc2\xdc\xd2\xe6\xd0@\xda\xc2\xd2\xdc\x14\xd2@\xe0\xe4\xc2\xf2@\xe8\xd0\xca\xca@\xe0\xea\xe8@\xd2\xdc\xe8\xde@\xf2\xde\xdc\xc8\xca\xe4@\xe0\xde\xe4\xe8\x14@@@@@@\xcc\xde\xe4@\xd2@\xcc\xca\xc2\xe4@\xc2@\xd0\xea\xe4\xe4\xd2\xc6\xc2\xdc\xca\\\x14\x14\xd8\xc2\xe6\xe8@\xdc\xd2\xce\xd0\xe8@\xe8\xd0\xca@\xda\xde\xde\xdc@\xd0\xc2\xc8@\xc2@\xce\xde\xd8\xc8\xca\xdc@\xe4\xd2\xdc\xce\x14@@@@@@\xc2\xdc\xc8@\xe8\xdeZ\xdc\xd2\xce\xd0\xe8@\xdc\xde@\xda\xde\xde\xdc@\xee\xca@\xe6\xca\xca\x14\xe8\xd0\xca@\xe6\xd6\xd2\xe0\xe0\xca\xe4@\xd0\xca@\xc4'
I also tried to use binascii.unhexlify('%x' % (int('0b' + bNum, 2))).decode('utf-8')
where bNum
is a long binary string
The text was originally from a utf-8 encoded .txt
file
EDIT: Lets say we have two bit strings, the first is the exact bit string from converting some text to a bit string. The second is extracted from an image. The second is exactly the same as the first up to the point where it was cut off because the image it was being hidden in didn't have enough pixels.
example: http://pastebin.com/NnaH9dEb
why would it throw UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 0: invalid continuation byte
error if they both contain the same data up to the point the second one cuts off?
EDIT2: when I convert the two bit strings to hex via hex(int(<var name>, 2))
I get different results, but converting only the first couple of bytes returns the same result.
Upvotes: 0
Views: 951
Reputation: 178399
The decode of decMsg
is misaligned. If I add 7 zero bits to the end of the message or truncate the last bit, it decodes with my method. Your code was TL;DR.
import math
initMsg = '11101000110100001100101...' # truncated due post limits.
decMsg = '11101000110100001100101...'
# Only printing the first 25 chars of the message for bevity:
a = int(initMsg,2)
print(a.to_bytes(math.ceil(a.bit_length()/8),'big')[:25])
a = int(decMsg,2)
print(a.to_bytes(math.ceil(a.bit_length()/8),'big')[:25])
a = int(decMsg+'0000000',2)
print(a.to_bytes(math.ceil(a.bit_length()/8),'big')[:25])
a = int(decMsg[:-1],2)
print(a.to_bytes(math.ceil(a.bit_length()/8),'big')[:25])
Output:
b'the wreck of the hesperus'
b'\xe8\xd0\xca@\xee\xe4\xca\xc6\xd6@\xde\xcc@\xe8\xd0\xca@\xd0\xca\xe6\xe0\xca\xe4\xea\xe6'
b'the wreck of the hesperus'
b'the wreck of the hesperus'
Compare \xe8
to t
in binary:
>>> format(ord('t'),'08b')
'01110100'
>>> format(0xe8,'08b')
'11101000'
Upvotes: 2