Reputation: 110093
I have the following url, which exists:
https://s3-us-west-1.amazonaws.com/premiere-avails/458ca3ce-c51e-4f69-8950-7af3e44f0a3d__chapter025.jpg
But this one does not:
https://s3-us-west-1.amazonaws.com/premiere-avails/459ca3ce-c51e-4f69-8950-7af3e44f0a3d__chapter025.jpg
Is there a way to check a url to see if it is valid, without downloading the file (it may be a 1GB file)? Note that I do not want to use boto
to see if the key exists, I would like to use an HTTP
request.
Upvotes: 3
Views: 3631
Reputation: 111
I'd use the requests
Python library, the function would look like this:
import requests
def check_url(url):
"""
Checks if the S3 link exists.
Parameters:
url (str): link to check if exists.
Returns:
bool: True if exists, False otherwise
"""
request = requests.head(url)
if request.status_code == 200:
return True
else:
return False
The requests.head()
function returns a requests.Response()
object from which you can get a lot of different values. If you want to check if the request's status code is less than 400 you could use request.ok == True
instead of comparing request.status_code == 200
. Also, function to request the head—requests.head()
—can also take on parameters such as a timeout; docs for this function here.
Upvotes: 1
Reputation: 40963
Try this:
import httplib
from urlparse import urlparse
def url_exists(url):
_, host, path, _, _, _ = urlparse(url)
conn = httplib.HTTPConnection(host)
conn.request('HEAD', path)
return conn.getresponse().status < 400
Upvotes: 7
Reputation: 45846
You could use curl. The --head
option would send a HEAD request rather than a GET so it would not return the body even if it did exist.
curl --head https://s3-us-west-1.amazonaws.com/premiere-avails/458ca3ce-c51e-4f69-8950-7af3e44f0a3d__chapter025.jpg
Upvotes: 1