sourabh patel
sourabh patel

Reputation: 17

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 398: invalid start byte || book python for everyone

hey am trying to pull image from web server using socket programming in python while going through python for everyone book there is example in networked programming chapter i copied the code from example urljpeg.py

import socket 
import time 
#HOST = 'data.pr4e.org'
#PORT = 80

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

mysock.connect(('data.pr4e.org', 80))

mysock.sendall(b'GET http://data.pr4e.org/cover3.jpg HTTP/1.0\r\n\r\n')
count = 0
picture = b""

while True:
    data = mysock.recv(5120)
    if len(data) < 1: break
# time .sleep(0.25)
    count = count + len(data)
    print( len(data),count)
    picture = picture + data

mysock.close()


# look for the end of the header (2crlf)

pos = picture.find(b"r\n\r\n")
print("Header length ", pos)
print(picture[:pos].decode())

# skip pasr the header and save the picture data
picture = picture[pos+4:]
fhand = open("stuff.jpg","wb")
fhand.write(picture)
fhand.close()

Upvotes: 0

Views: 1256

Answers (1)

Ture P&#229;lsson
Ture P&#229;lsson

Reputation: 6796

The error message indicates that you are trying to decode data which is not utf-8. So why is this happening? Let's take a step back and look at what the code is doing:

# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")
print("Header length ", pos)
print(picture[:pos].decode())

We're trying to find a sequence of \r\n\r\n, i.e. CR LF CR LF in the data. This would be the empty line that separates the HTTP header (which should be in ASCII, which is a subset of UTF-8) from the actual image data. Then we try to decode everything up to that point as a string. So why does it fail? The program conveniently prints the header length, and in the bit you posted earlier we could see that this was -1, which means that the picture.find call did not find anything! Why not? Well, look carefully at what the code actually does:

# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")

It should be looking for \r\n\r\n, but it is actually looking for r\n\r\n!

Upvotes: 1

Related Questions