Reputation: 21
I am trying to define a function that resumes download if the connection is broken. However, the following does not work as expected. In line 8, I have to manually deduce one chunk-size in order for it to work, otherwise, the final file will be missing exactly one chunk-size for each time I resume it.
if os.path.exists(fileName):
header = requests.head(url)
fileLength = int(header.headers['Content-Length'])
if fileLength == os.path.getsize(fileName):
return True
else:
with open(fileName, 'ab') as f:
position = f.tell()-1024
pos_header = {}
print(position)
pos_header['Range'] = f'bytes={position}-'
with requests.get(url, headers = pos_header, stream = True) as r:
with open(fileName, 'ab') as f:
#some validation should be here
for chunk in r.iter_content(chunk_size=1024):
if chunk:
f.write(r.content)
f.flush()
print(os.path.getsize(fileName))
else:
with requests.get(url, allow_redirects=True, stream = True) as r:
with open(fileName, 'wb') as f:
iter = 0
for chunk in r.iter_content(chunk_size = 1024):
if chunk:
f.write(chunk)
f.flush()
iter += 1
if iter > 2000:
break
Interestingly, the part missing is the in-between two parts of the downloads. Is there a more elegant way of resolving this than what I did?
Upvotes: 2
Views: 499
Reputation: 12503
You have a bug in the code that downloads the 'rest' of the file if it's the second attempt. The bug is in the following line:
f.write(r.content)
It should be
f.write(chunk)
Basically, you're iterating over chunks but writing the entire content, and that messes things up.
Upvotes: 1