Reputation: 2353
I know that a lot of people on the Internet have expressed having problems with string encodings in Python but no matter what I try, I can't figure out how to fix my problem. Essentially, I'm using TCP sockets to connect to a Web Server and then I send that Server a HTTP Request. I read the response into a series of buffers that I decode and concatenate to create a complete response as a string. When I get the response however, I'm getting UnicodeDecodingErrors. I want to use my program to go on to many different websites so is there any solution to this problem that would work with just about any site I give it?
Thank you for your time.
Some code:
def getAllFromSocket(socket):
'''Reads all data from a socket and returns a string of it.'''
more_bytes = True
message = ''
if(socket!=None):
while(more_bytes):
buffer = socket.recv(1024)
if len(buffer) == 0:
more_bytes = False
else:
message += buffer.decode('utf-8')
return message
So when I do this:
received_message = getAllFromSocket(my_sock)
I get:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 1023: unexpected end of data
Upvotes: 1
Views: 3039
Reputation: 15872
You can try finding the encoding of the data using UnicodeDammit. Make sure you're getting utf-8
. You can also choose to ignore errors:
buffer.decode("utf-8", "ignore")
Upvotes: 1