Reputation: 95
I am having problem with decoding byte string that I have to send from one computer to another. File is format PDF. I get error that goes:
fileStrings[i] = fileStrings[i].decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 648: invalid continuation byte
Any ideas of how to remove b' ' marking? I need to compile file back up, but i also need to know its size in bytes before sending it and I figured I will know it by decoding each byte string (Works for txt files but not for pdf ones..)
Code is:
with open(inputne, "rb") as file:
while 1:
readBytes= file.read(dataMaxSize)
fileStrings.append(readBytes)
if not readBytes:
break
readBytes= ''
filesize=0
for i in range(0, len(fileStrings)):
fileStrings[i] = fileStrings[i].decode()
filesize += len(fileStrings[i])
Edit: For anyone having same issue, parameter len() will give you size without b''.
Upvotes: 0
Views: 1729
Reputation: 35482
In Python, bytestrings are for raw binary data, and strings are for textual data. decode
tries to decode it as utf-8, which is valid for txt files, but not for pdf files, since they can contain random bytes. You should not try to get a string, since bytestrings are designed for this purpose. You can get the length of bytestrings like normal, with len(data)
. Many of the string operations also apply to bytestrings, such as concatenation and slicing (data1 + data2
and data[1:3]
).
As a side note, the b''
when you print it is just because the __str__
method for bytestrings is equivalent to repr
. It's not in the data itself.
Upvotes: 1