Reputation: 177
How do I check whether the given url is downloadable or not using Python?
It should return True
if it is downloadable else False
An example of a non-downloadable url: www.google.com
Note: I am not speaking about contents of the URL and saving it as a web page.
What is a downloadable URL?
If you redirect to a URL and if a file starts to download, then it is a downloadable url
Example: https://drive.google.com/uc?id=1QOmVDpd8hcVYqqUXDXf68UMDWQZP0wQV&export=download
Note: It downloads the stackoverflow annual survey 2019 data set.
Upvotes: 5
Views: 8253
Reputation: 31
Downloadable files must have Content-Length in headers :
import requests
r = requests.get(url, stream=True)
try:
print(r.headers['content-length'])
except:
print("Not Downloadable")
Upvotes: 0
Reputation: 1552
So I tried searching for a better way, the site link which I was checking was a bit tricky
most stackoverflow answers mentioned about using head request to get response header, but the site I was checking returned 404 error.When I use get request the whole file is downloaded before outputing the header.My friend suggested me a solution of using the parameter stream=True
and that really got worked.
import requests
r = requests.get(link, stream=True)
print(r.headers)
Upvotes: 3
Reputation: 2584
This can be done in using the popular requests library
import requests
url = 'https://www.google.com'
headers=requests.head(url).headers
downloadable = 'attachment' in headers.get('Content-Disposition', '')
Content Disposition Header reference
Upvotes: 12
Reputation: 1212
On HTTP protocol level itself, there is no distinction between downloadable and non-downloadable URL. There is an HTTP request and there is a subsequent response. Response body can be a binary file, HTML, image etc..
You can just request the HTTP response header and look for Content-Type:
and decide whether you want to consider that content-type as downloadable or non-downloadable.
Upvotes: 1