Matt
Matt

Reputation: 4387

Python3 socket cannot decode content

I'm facing a strange issue. I cannot decode the data received through a socket connection while it's working with the same code in python 2.7. I know that the data type received in python 2 is a string an bytes in python 3. But I don't understand why I'm receiving an error when I try to decode. I'm sending exactly the same datas(copy/paste to be sure) except that I need to perform .encode() for python 3 to avoid to received "TypeError, a bytes-like object is required, not 'str' "

Python2:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(15)
s.connect((SERVERIP, SERVERPORT))
s.send(message)
data = ''
while True:
    new_data = s.recv(4096)
    if not new_data:
        break
    data += new_data
    s.close()

Python 3

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(15)
s.connect((SERVERIP, SERVERPORT))
s.send(message)
data = ''
while True:
    new_data = s.recv(4096)
    if not new_data:
        break
    data += new_data.decode('utf-8') #same result with new_data.decode()
    s.close()

Python 2 new_data content:

'\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\x05\xc1\xdd\x12B@\x18\x00\xd0\x07r\xa3\xb6\xfdv]t\xa1T&\xb5d\x91\xd1tA\x98]F\xfeB\x1a\x0f\xdf9yu\x10s\xa3\xa29:\xdbl\xae\xe9\xe8\xd9H\xc8v\xa8\xd0K\x8c\xde\xd7\xef\xf9\xc4uf\xca\xfd \xdd\xb7\x0c\x9a\x84\xe9\xec\xb7\xf1\xf3\x97o\\k\xd5E\xc3\r\x11(\x9d{\xf7!\xdc*\x8c\xd5\x1c\x0b\xadG\xa5\x1e(\x97dO\x9b\x8f\x14\xaa\xddf\xd7I\x1e\xbb\xd4\xe7a\xe4\xe6a\x88\x8b\xf5\xa0\x08\xab\x11\xda\xea\xb8S\xf0\x98\x94\x1c\x9d\xa24>9\xbai\xd3\x1f\xe6\xcc`^\x91\xca\x02j\x1aLy\xccj\x0fdVn\x17@\xb0\xc1@\x80hX#\xb0\x06\n\x0b\xc0\xf2x\xfe\x01?\x05\x1f\xc1\xc5\x00\x00\x00'

Python3 new_data content:

b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\x05\xc1\xdb\x12B@\x00\x00\xd0\x0f\xf2\xc0\xda\xb5\xcbC\x0f"-\xb9gPM\x0f\x85&\x8b)\xb7\x1d\x1a\x1f\xdf9\xe3\xbc\xbe\xfd\x9e\xd9A\xe3:\x851,\xcf\xc4\xe5\x865|\xa5\xcb\xbb\xcbs\xa8\x8f\xcc\x1b\xf7\x06\xc5\x8f\xfa\xba\x84\xd8>\xea\xc0\xa5b\xe6\xceC\xea\xd0\x88\xebM\t\xd7\xf8\xc1*#hI\xd6F\x80\xb3B[\xa7\x99\x91\xbe\x16%Q\xf5\x1d(\xa0\x93\x87\n\x13\xbe\x92\x91\xcc\xbfT\x98b\xd3\x0b=\xc0\xd5\xb3\xdf}\xcc\xc9\xb1\xe4\'\xb1\xe25\xcc{tl\xe5\x92\xf34x\xd5\xa1\xf9K\xa4\xa8k\xa8 dU\xd7\x1e\xce\xb4\x02\xean\xc3\x10#\x05\x13L\x14\xa0(H\xd2d\xb8a\xbc\xdd\xee\x7f\x1b\xe5\xf1\xd2\xc5\x00\x00\x00'

And so when in python3 I'm receiving this error when I try to decode:

'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

The data received are not the same. The difference start after 'x12B@'. Someone has an explanation?

I'm not managing the server side so don't ask me to check this side!

Thanks,

Matthieu

Upvotes: 0

Views: 2149

Answers (1)

Duncan
Duncan

Reputation: 95652

For Python 3 you need to work with bytes, the data you have is not a text string so don't try and interpret it as one.

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(15)
s.connect((SERVERIP, SERVERPORT))
s.send(message)
data = b''
while True:
    new_data = s.recv(4096)
    if not new_data:
        break
    data += new_data
    s.close()

That should be all you need to receive the data: start with an empty bytes object created using b'' or just bytes(), but you will also have to be aware you are working with bytes when you come to process the data so that code will probably need changing as well.

You next step in processing this is probably:

import gzip
text = gzip.decompress(data)

and at this point it may be appropriate to change that to:

text = gzip.decompress(data).decode('ascii')

using whatever encoding is appropriate here (the sample data you posted above only contains ascii when decompressed so that might be all you need, or you might want utf-8 or some other encoding but you'll have to find out what was used to encode the data as you shouldn't attempt to guess). However it looks like it contains some pipe-separated fields so you might want to split the fields first and decode or otherwise process them individually:

fields = gzip.decompress(b).split(b'|')

Upvotes: 1

Related Questions