Reputation: 1022
I have a list of links that I am trying to get the size of to determine how much computational resources each file need. Is it possible to just get the file size with a get request or something similar?
Here is an example of one of the links: https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887
Thanks
Upvotes: 12
Views: 17513
Reputation: 31676
You need to use the HEAD
method. The example uses requests
(pip install requests
).
#!/usr/bin/env python
# display URL file size without downloading
import sys
import requests
# pass URL as first argument
response = requests.head(sys.argv[1], allow_redirects=True)
size = response.headers.get('content-length', -1)
# size in megabytes (Python 2, 3)
print('{:<40}: {:.2f} MB'.format('FILE SIZE', int(size) / float(1 << 20)))
# size in megabytes (f-string, Python 3 only)
# print(f"{'FILE SIZE':<40}: {int(size) / float(1 << 20):.2f} MB")
Also see How do you send a HEAD HTTP request in Python 2? if you need a standard-library based solution.
Upvotes: 8
Reputation: 1300
To do this use the HTTP HEAD method which just grabs the header information for the URL and doesn't download the content like an HTTP GET request does.
$curl -I https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 18 Mar 2019 16:56:35 GMT
Content-Type: application/octet-stream
Content-Length: 578220087
Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT
Connection: keep-alive
Accept-Ranges: bytes
The file size is in the 'Content-Length' header. In Python 3.6:
>>> import urllib
>>> req = urllib.request.Request('https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887',
method='HEAD')
>>> f = urllib.request.urlopen(req)
>>> f.status
200
>>> f.headers['Content-Length']
'578220087'
Upvotes: 13
Reputation: 7846
If you're using Python 3, you can do it using urlopen
from urllib.request
:
from urllib.request import urlopen
link = "https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887"
site = urlopen(link)
meta = site.info()
print(meta)
This will output:
Server: nginx
Date: Mon, 18 Mar 2019 17:02:40 GMT
Content-Type: application/octet-stream
Content-Length: 578220087
Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT
Connection: close
Accept-Ranges: bytes
The Content-Length
property is the size of your file in bytes.
Upvotes: 1