Joe B
Joe B

Reputation: 1022

How can I get the file size from a link without downloading it in python?

I have a list of links that I am trying to get the size of to determine how much computational resources each file need. Is it possible to just get the file size with a get request or something similar?

Here is an example of one of the links: https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887

Thanks

Upvotes: 12

Views: 17513

Answers (3)

ccpizza
ccpizza

Reputation: 31676

You need to use the HEAD method. The example uses requests (pip install requests).

#!/usr/bin/env python
# display URL file size without downloading

import sys
import requests

# pass URL as first argument
response = requests.head(sys.argv[1], allow_redirects=True)

size = response.headers.get('content-length', -1)

# size in megabytes (Python 2, 3)
print('{:<40}: {:.2f} MB'.format('FILE SIZE', int(size) / float(1 << 20)))

# size in megabytes (f-string, Python 3 only)
# print(f"{'FILE SIZE':<40}: {int(size) / float(1 << 20):.2f} MB")

Also see How do you send a HEAD HTTP request in Python 2? if you need a standard-library based solution.

Upvotes: 8

Steven Graham
Steven Graham

Reputation: 1300

To do this use the HTTP HEAD method which just grabs the header information for the URL and doesn't download the content like an HTTP GET request does.

$curl -I https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 18 Mar 2019 16:56:35 GMT
Content-Type: application/octet-stream
Content-Length: 578220087
Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT
Connection: keep-alive
Accept-Ranges: bytes

The file size is in the 'Content-Length' header. In Python 3.6:

>>> import urllib
>>> req = urllib.request.Request('https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887', 
                                 method='HEAD')
>>> f = urllib.request.urlopen(req)
>>> f.status
200
>>> f.headers['Content-Length']
'578220087'

Upvotes: 13

Vasilis G.
Vasilis G.

Reputation: 7846

If you're using Python 3, you can do it using urlopen from urllib.request:

from urllib.request import urlopen
link =  "https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887"
site = urlopen(link)
meta = site.info()
print(meta)

This will output:

Server: nginx
Date: Mon, 18 Mar 2019 17:02:40 GMT
Content-Type: application/octet-stream
Content-Length: 578220087
Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT
Connection: close
Accept-Ranges: bytes

The Content-Length property is the size of your file in bytes.

Upvotes: 1

Related Questions