Reputation: 626
I want to download an example image from a HTTP server using methods defined in HTTP protocol (and socket's, of course).
I tried to implement it, but it seems that my code does not download the whole image, no matter if I have the while
loop or not.
An example image is here: https://httpbin.org/image/png.
My code downloads only part of the image, and I do not know how to fix it. I do not want use any libraries, such as urllib
, I want to use just the sockets.
Any ideas?
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('httpbin.org', 80))
s.sendall('GET /image/png HTTP/1.1\r\nHOST: httpbin.org\r\n\r\n')
reply = ""
while True:
data = s.recv(2048)
if not data: break
reply += data
# get image size
size = -1
tmp = reply.split('\r\n')
for line in tmp:
if "Content-Length:" in line:
size = int(line.split()[1])
break
headers = reply.split('\r\n\r\n')[0]
image = reply.split('\r\n\r\n')[1]
# save image
f = open('image.png', 'wb')
f.write(image)
f.close()
Upvotes: 1
Views: 5035
Reputation: 5954
import socket
import select
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('httpbin.org', 80))
s.sendall(b'GET /image/png HTTP/1.1\r\nHOST: httpbin.org\r\n\r\n')
reply = b''
while select.select([s], [], [], 3)[0]:
data = s.recv(2048)
if not data: break
reply += data
headers = reply.split(b'\r\n\r\n')[0]
image = reply[len(headers)+4:]
# save image
f = open('image.png', 'wb')
f.write(image)
f.close()
Note this example is not perfect. The elegant way should be checking Content-Length
header and recv
exact length of data. (Instead of hard coding 3
seconds as timeout.) And if the server can use chunked encoding, it becomes even more complicated.)
--
The example is in python 3
Upvotes: 2
Reputation: 123461
You are doing a HTTP/1.1 request. This HTTP version implicitly behaves like Connection: keep-alive
was set. This means that the server might not close the TCP connection immediately after sending the response as you expect in your code but might keep the connection open to wait for more HTTP requests.
When replacing the version with HTTP/1.0 instead the server closes the connection after the request is done and the image is complete because HTTP/1.0 implies Connection: close
.
Apart from that: HTTP is way more complex than you might think. Please don't just design your code after some example messages you've seen somewhere but actually read and follow the standards if you really want to implement HTTP yourself.
Upvotes: 2