Reputation: 709
Situation: The file to be downloaded is a large file (>100MB). It takes quite some time, especially with slow internet connection.
Problem: However, I just need the file header (the first 512 bytes), which will decide if the whole file needs to be downloaded or not.
Question: Is there a way to do download only the first 512 bytes of a file?
Additional information: Currently the download is done using urllib.urlretrieve in Python2.7
Upvotes: 6
Views: 2460
Reputation: 2876
If the url you are trying to read responds with Content-Length
header, then you can get the file size with urllib2
in Python 2.
def get_file_size(url):
request = urllib2.Request(url)
request.get_method = lambda : 'HEAD'
response = urllib2.urlopen(request)
length = response.headers.getheader("Content-Length")
return int(length)
The function can be called to get the length and compared with some threshold value to decide whether to download or not.
if get_file_size("http://stackoverflow.com") < 1000000:
# Download
(Note that the Python 3 implimentation differs slightly:)
from urllib import request
def get_file_size(url):
r = request.Request(url)
r.get_method = lambda : 'HEAD'
response = request.urlopen(r)
length = response.getheader("Content-Length")
return int(length)
Upvotes: 0
Reputation: 937
I think curl
and head
would work better than a Python solution here:
curl https://my.website.com/file.txt | head -c 512 > header.txt
EDIT: Also, if you absolutely must have it in a Python script, you can use subprocess
to perform the curl
piped to head
command execution
EDIT 2: For a fully Python solution: The urlopen
function (urllib2.urlopen
in Python 2, and urllib.request.urlopen
in Python 3) returns a file-like stream that you can use the read
function on, which allows you to specify a number of bytes. For example, urllib2.urlopen(my_url).read(512)
will return the first 512 bytes of my_url
Upvotes: 2