Hudson Worden
Hudson Worden

Reputation: 2353

Python Decoding/Encoding Problems

I know that a lot of people on the Internet have expressed having problems with string encodings in Python but no matter what I try, I can't figure out how to fix my problem. Essentially, I'm using TCP sockets to connect to a Web Server and then I send that Server a HTTP Request. I read the response into a series of buffers that I decode and concatenate to create a complete response as a string. When I get the response however, I'm getting UnicodeDecodingErrors. I want to use my program to go on to many different websites so is there any solution to this problem that would work with just about any site I give it?

Thank you for your time.

Some code:

def getAllFromSocket(socket):
    '''Reads all data from a socket and returns a string of it.'''
    more_bytes = True
    message = ''
    if(socket!=None):
        while(more_bytes):
        buffer = socket.recv(1024)
        if len(buffer) == 0:
            more_bytes = False
        else:
            message += buffer.decode('utf-8')
    return message

So when I do this:

received_message = getAllFromSocket(my_sock)

I get:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 1023: unexpected end of data

Upvotes: 1

Views: 3039

Answers (1)

Vlad the Impala
Vlad the Impala

Reputation: 15872

You can try finding the encoding of the data using UnicodeDammit. Make sure you're getting utf-8. You can also choose to ignore errors:

buffer.decode("utf-8", "ignore")

Upvotes: 1

Related Questions