Dudo
Dudo

Reputation: 95

Python unable to decode byte string

I am having problem with decoding byte string that I have to send from one computer to another. File is format PDF. I get error that goes:

fileStrings[i] = fileStrings[i].decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 648: invalid continuation byte

Any ideas of how to remove b' ' marking? I need to compile file back up, but i also need to know its size in bytes before sending it and I figured I will know it by decoding each byte string (Works for txt files but not for pdf ones..)

Code is:

    with open(inputne, "rb") as file:
        while 1:
            readBytes= file.read(dataMaxSize)
            fileStrings.append(readBytes)
            if not readBytes:
                break
            readBytes= ''
    
    filesize=0
    for i in range(0, len(fileStrings)):
        fileStrings[i] = fileStrings[i].decode()
        filesize += len(fileStrings[i])

Edit: For anyone having same issue, parameter len() will give you size without b''.

Upvotes: 0

Views: 1729

Answers (1)

Aplet123
Aplet123

Reputation: 35482

In Python, bytestrings are for raw binary data, and strings are for textual data. decode tries to decode it as utf-8, which is valid for txt files, but not for pdf files, since they can contain random bytes. You should not try to get a string, since bytestrings are designed for this purpose. You can get the length of bytestrings like normal, with len(data). Many of the string operations also apply to bytestrings, such as concatenation and slicing (data1 + data2 and data[1:3]).

As a side note, the b'' when you print it is just because the __str__ method for bytestrings is equivalent to repr. It's not in the data itself.

Upvotes: 1

Related Questions