Paul S.
Paul S.

Reputation: 66364

Abort a request after checking response headers

I have a script that requests a URL via urllib.request's urlopen and then gets it's info().

I don't want to proceed with the request after I've got these headers so I'm currently just leaving it as it is and forgetting about it, but this seems like I'm leaving the connection open and perhaps the server is sending more that just gets ignored.

How can I abort the request properly?

#!/usr/bin/python3

import urllib.request

response = urllib.request.urlopen('http://google.co.uk')
headers = dict(response.info())
print(headers)
# now finished with response, abort???
# ... more stuff

Upvotes: 1

Views: 518

Answers (1)

John
John

Reputation: 13699

I think what you want is a HEAD request. Something like

>>> import httplib
>>> c = httplib.HTTPConnection("www.google.co.uk")
>>> c.request("HEAD", "/index.html")
>>> r = c.getresponse()
>>> r.getheaders()
[('x-xss-protection', '1; mode=block'), ('transfer-encoding', 'chunked'), ('set-cookie', 'PREF=ID=7867b0a5641d5f7b:FF=0:TM=1363882090:LM=1363882090:S=EXLl2JgBqzMKODcq; expires=Sat, 21-Mar-2015 16:08:10 GMT; path=/; domain=.google.co.uk, NID=67=qElAph6eqHyYKbh995ivP4B-21YRDRED4-uRXx0AvC3vLpv0SF1LkdsI2k6Hg1IhsatrVVqWf2slcMCaQsAZwZ89YfU0F1iPVBdt9PC2FItff31oRJ3gvhJVTQLa_RAt; expires=Fri, 20-Sep-2013 16:08:10 GMT; path=/; domain=.google.co.uk; HttpOnly'), ('expires', '-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0'), ('date', 'Thu, 21 Mar 2013 16:08:10 GMT'), ('p3p', 'CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."'), ('content-type', 'text/html; charset=ISO-8859-1'), ('x-frame-options', 'SAMEORIGIN')]
>>>

From w3.org

The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.

The response to a HEAD request MAY be cacheable in the sense that the information contained in the response MAY be used to update a previously cached entity from that resource. If the new field values indicate that the cached entity differs from the current entity (as would be indicated by a change in Content-Length, Content-MD5, ETag or Last-Modified), then the cache MUST treat the cache entry as stale.

Upvotes: 1

Related Questions