Reputation: 17
hey am trying to pull image from web server using socket programming in python while going through python for everyone book there is example in networked programming chapter i copied the code from example urljpeg.py
import socket
import time
#HOST = 'data.pr4e.org'
#PORT = 80
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
mysock.sendall(b'GET http://data.pr4e.org/cover3.jpg HTTP/1.0\r\n\r\n')
count = 0
picture = b""
while True:
data = mysock.recv(5120)
if len(data) < 1: break
# time .sleep(0.25)
count = count + len(data)
print( len(data),count)
picture = picture + data
mysock.close()
# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")
print("Header length ", pos)
print(picture[:pos].decode())
# skip pasr the header and save the picture data
picture = picture[pos+4:]
fhand = open("stuff.jpg","wb")
fhand.write(picture)
fhand.close()
Upvotes: 0
Views: 1256
Reputation: 6796
The error message indicates that you are trying to decode data which is not utf-8. So why is this happening? Let's take a step back and look at what the code is doing:
# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")
print("Header length ", pos)
print(picture[:pos].decode())
We're trying to find a sequence of \r\n\r\n, i.e. CR LF CR LF in the data. This would be the empty line that separates the HTTP header (which should be in ASCII, which is a subset of UTF-8) from the actual image data. Then we try to decode everything up to that point as a string. So why does it fail? The program conveniently prints the header length, and in the bit you posted earlier we could see that this was -1, which means that the picture.find
call did not find anything! Why not? Well, look carefully at what the code actually does:
# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")
It should be looking for \r\n\r\n
, but it is actually looking for r\n\r\n
!
Upvotes: 1