Jason
Jason

Reputation: 14605

How to check HTTP status of a file online without fully downloading the file?

I have a database of thousands of files online, and I want to check what their status is (e.g. if the file exists, if it sends us to a 404, etc.) and update this in my database.

I've used urllib.request to download files to a python script. However, obviously downloading terabytes of files is going to take a long time. Parallelizing the process would help, but ultimately I just don't want to download all the data, just check the status. Is there an ideal way to check (using urllib or another package) the HTTP response code of a certain URL?

Additionally, if I can get the file size from the server (which would be in the HTTP response), then I can also update this in my database.

Upvotes: 0

Views: 560

Answers (2)

Shrmn
Shrmn

Reputation: 398

The requests module can check the status response of a request. Just do:

import requests

url = 'https://www.google.com'  # Change to your link
response = requests.get(url)
print(response.status_code)

this code shows me 200, so the request has been successful

Upvotes: 1

Tim Roberts
Tim Roberts

Reputation: 54698

If your web server is standards-based, you can use a HEAD request instead of a GET. It returns the same status without actually fetching the page.

Upvotes: 2

Related Questions