Reputation: 2612
I been searching on this for a couple of days and havent found an answer yet.
I have trying to download video files from an FTP, my script checks the server, compares the nlist() to a list of already downloaded files parsed from a text file and then creates a new list of files to get and iterates over it downloading each file, disconnecting from the server and reconnecting for the next file (I thought server timeout might be an issue so I quit() the connection after each file download).
This works for the first few files but as soon as I hit a file that takes longer than 5 mins, fitlib just hangs at the end of the transfer (I can see in explorer that the file is the correct size so the download has completed but it doesnt seem to be getting the message and moving on to the next file)
any help will be greatly appreciated, my code is below:
newPath = "Z:\\pathto\\downloads\\"
for f in getFiles:
print("Getting " + f)
for f in getFiles:
fil = f.rstrip()
ext = os.path.splitext(fil)[1]
if ext in validExtensions:
print("Downloading new file: " + fil)
downloadFile(fil, newPath)
here is download.py
from ftplib import FTP
def downloadFile(filename, folder):
myhost = 'host'
myuser = 'user'
passw = 'pass'
#login
ftp = FTP(myhost,myuser,passw)
localfile = open(folder + filename, 'wb')
ftp.retrbinary("RETR " + filename, localfile.write, 1024)
print("Downloaded " + filename)
localfile.close()
ftp.quit()
Upvotes: 11
Views: 12422
Reputation: 2177
I do this, note that tf is an open filehandle that is passed in. I've redacted some stuff, but the general premise is to check how much data has been downloaded and abort the FTP when the downloaded amount matches the file size.
In my case, the issue has been that the transfer basically hangs once all the data has been downloaded - the server never closes the connection or whatever.
def download_file(filename, tf, size=None):
def callback(data):
tf.write(data)
if size == tf.tell():
raise FileCompleteException('Done!')
with FTP(host='ftp.example.com',
user='user',
passwd='xxx') as ftp:
try:
ftp.retrbinary(f'RETR {filename}',
callback)
except FileCompleteException:
pass
Upvotes: 1
Reputation: 579
Based on abarnet's solution (which was still hanging at the end) I've written this which finally works :-)
import ftplib
from tempfile import SpooledTemporaryFile
MEGABYTE = 1024 * 1024
def download(ftp_host, ftp_user, ftp_pass, ftp_path, filename):
ftp = ftplib.FTP(ftp_host, ftp_user, ftp_pass, timeout=3600) # timeout: 1-hour
ftp.cwd(ftp_path)
filesize = ftp.size(filename) / MEGABYTE
print(f"Downloading: {filename} SIZE: {filesize:.1f} MB")
with SpooledTemporaryFile(max_size=MEGABYTE, mode="w+b") as ff:
sock = ftp.transfercmd('RETR ' + filename)
while True:
buff = sock.recv(MEGABYTE)
if not buff: break
ff.write(buff)
sock.close()
ff.rollover() # force saving to HDD of the final chunk!!
ff.seek(0) # prepare for data reading
print("Reading the buffer...")
# alldata = ff.read()
# upload_file_to_adls(filename, alldata, account_name, account_key, container, adls_path)
ftp.quit()
Upvotes: 1
Reputation: 366003
Without more information, I can't actually debug your problem, so I can only suggest the most general answer. This will probably not be necessary for you, but probably will be sufficient for anyone.
retrbinary
will block until the entire file is done. If that's longer than 5 minutes, nothing will get sent over the control channel for the entire 5 minutes. Either your client is timing out the control channel, or the server is. So, when you try to hang up with ftp.quit()
, it will either hang forever or raise an exception.
You can control your side's timeout with a timeout
argument on the FTP
constructor. Some servers support an IDLE
command to allow you to set the server-side timeout. But, even if the appropriate one turns out to be doable, how do you pick an appropriate timeout in the first place?
What you really want to do is prevent the control socket from timing out while a transfer is happening on the data socket. But how? If you, e.g., ftp.voidcmd('NOOP')
every so often in your callback function, that'll be enough to keep the connection alive… but it'll also force you to block until the server responds to the NOOP
, which many servers will not do until the data transfer is complete, which means you'll just end up blocking forever (or until a different timeout) and not getting your data.
The standard techniques for handling two sockets without one blocking on the other are a multiplexer like select.select
or threads. And you can do that here, but you will have to give up using the simple retrbinary
interface and instead using transfercmd
to get the data socket explicitly.
For example:
def downloadFile(…):
ftp = FTP(…)
sock = ftp.transfercmd('RETR ' + filename)
def background():
f = open(…)
while True:
block = sock.recv(1024*1024)
if not block:
break
f.write(block)
sock.close()
t = threading.Thread(target=background)
t.start()
while t.is_alive():
t.join(60)
ftp.voidcmd('NOOP')
An alternative solution would be to read, say, 20MB at a time, then call ftp.abort()
, and use the rest
argument to resume the transfer with each new retrbinary
until you reach the end of the file. However, ABOR
could hang forever, just like that NOOP
, so that doesn't guarantee anything—not to mention that servers don't have to respond to it.
What you could do is just close the whole connection down (not quit
, but close
). This is not very nice to the server, and may result in some wasted data being re-sent, and may also prevent TCP from doing its usual ramp up to full speed if you kill the sockets too quickly. But it should work.
See this answer—and notice that it requires a bit of testing against your particular broken server to figure out which, if any, variation works correctly and efficiently.
Upvotes: 35