NewbieProgrammer
NewbieProgrammer

Reputation: 177

How to check whether a URL is downloadable or not?

How do I check whether the given url is downloadable or not using Python?

It should return True if it is downloadable else False

An example of a non-downloadable url: www.google.com

Note: I am not speaking about contents of the URL and saving it as a web page.

What is a downloadable URL?

If you redirect to a URL and if a file starts to download, then it is a downloadable url

Example: https://drive.google.com/uc?id=1QOmVDpd8hcVYqqUXDXf68UMDWQZP0wQV&export=download

Note: It downloads the stackoverflow annual survey 2019 data set.

Upvotes: 5

Views: 8253

Answers (4)

14 14
14 14

Reputation: 31

Downloadable files must have Content-Length in headers :

import requests
r = requests.get(url, stream=True)

try:
    print(r.headers['content-length'])
except:
    print("Not Downloadable")

Upvotes: 0

Alen Paul Varghese
Alen Paul Varghese

Reputation: 1552

So I tried searching for a better way, the site link which I was checking was a bit tricky most stackoverflow answers mentioned about using head request to get response header, but the site I was checking returned 404 error.When I use get request the whole file is downloaded before outputing the header.My friend suggested me a solution of using the parameter stream=True and that really got worked.

import requests 
r = requests.get(link, stream=True)
print(r.headers)

Upvotes: 3

Abhishek J
Abhishek J

Reputation: 2584

This can be done in using the popular requests library

import requests
url = 'https://www.google.com'
headers=requests.head(url).headers
downloadable = 'attachment' in headers.get('Content-Disposition', '')

Content Disposition Header reference

Upvotes: 12

Tejas Sarade
Tejas Sarade

Reputation: 1212

On HTTP protocol level itself, there is no distinction between downloadable and non-downloadable URL. There is an HTTP request and there is a subsequent response. Response body can be a binary file, HTML, image etc..

You can just request the HTTP response header and look for Content-Type: and decide whether you want to consider that content-type as downloadable or non-downloadable.

Upvotes: 1

Related Questions