mirx
mirx

Reputation: 626

How to download image from HTTP server [Python/sockets]

I want to download an example image from a HTTP server using methods defined in HTTP protocol (and socket's, of course).

I tried to implement it, but it seems that my code does not download the whole image, no matter if I have the while loop or not.

An example image is here: https://httpbin.org/image/png.

My code downloads only part of the image, and I do not know how to fix it. I do not want use any libraries, such as urllib, I want to use just the sockets.

Any ideas?

import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('httpbin.org', 80))
s.sendall('GET /image/png HTTP/1.1\r\nHOST: httpbin.org\r\n\r\n')

reply = ""

while True:
    data = s.recv(2048)
    if not data: break
    reply += data

# get image size
size = -1
tmp = reply.split('\r\n')
for line in tmp:
   if "Content-Length:" in line:
      size = int(line.split()[1])
      break

headers =  reply.split('\r\n\r\n')[0]
image = reply.split('\r\n\r\n')[1]

# save image
f = open('image.png', 'wb')
f.write(image)
f.close()

Upvotes: 1

Views: 5035

Answers (2)

cshu
cshu

Reputation: 5954

import socket
import select

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('httpbin.org', 80))
s.sendall(b'GET /image/png HTTP/1.1\r\nHOST: httpbin.org\r\n\r\n')

reply = b''

while select.select([s], [], [], 3)[0]:
    data = s.recv(2048)
    if not data: break
    reply += data

headers =  reply.split(b'\r\n\r\n')[0]
image = reply[len(headers)+4:]

# save image
f = open('image.png', 'wb')
f.write(image)
f.close()

Note this example is not perfect. The elegant way should be checking Content-Length header and recv exact length of data. (Instead of hard coding 3 seconds as timeout.) And if the server can use chunked encoding, it becomes even more complicated.)

--

The example is in python 3

Upvotes: 2

Steffen Ullrich
Steffen Ullrich

Reputation: 123461

You are doing a HTTP/1.1 request. This HTTP version implicitly behaves like Connection: keep-alive was set. This means that the server might not close the TCP connection immediately after sending the response as you expect in your code but might keep the connection open to wait for more HTTP requests.

When replacing the version with HTTP/1.0 instead the server closes the connection after the request is done and the image is complete because HTTP/1.0 implies Connection: close.

Apart from that: HTTP is way more complex than you might think. Please don't just design your code after some example messages you've seen somewhere but actually read and follow the standards if you really want to implement HTTP yourself.

Upvotes: 2

Related Questions