Jed
Jed

Reputation: 676

Multithreading/Async download from single FTP server

I need to download many files from a single folder in a single server and so I'm looking for a way to do it quicker. After a little bit of reading it seems that either a multi-threading or asynchronous approach would work, but I can't seem to get either approach to work.

The async approach I'm using is below. This works, i.e. no errors, but it only downloads one file at a time, and so doesn't improve speed. Is there away to modify it so that I do improve speed?

async def get_file(self):
    async with aioftp.ClientSession(self.host, self.port, self.login, self.password) as client:
        async for path, info in client.list(recursive=True):
            if info["type"] == "file":
                await client.download(path, destination=self.dest_dir,write_into=True, block_size=self.block_size)


def async_update(self):
  loop = asyncio.get_event_loop()
  loop.run_until_complete(self.get_file())
  loop.close()

Then I tried using the simple Pool() func in multiprocessing as below:

def simple_fetch(self,file)     
    file = open(self.dest_dir+filename, 'wb')
    ftp.retrbinary('RETR ' + filename, file.write, 8192*(2^3)) #, 8192)
    file.close()

def multi_fetch(self):
    pool = Pool()
    pool.map(self.simple_fetch,self.update_files)
    pool.close()
    pool.join()

But this fails with an error. I'll update with that error as soon as I'm back at the server.

Upvotes: 3

Views: 4342

Answers (2)

broomrider
broomrider

Reputation: 656

I'm author of aioftp. The reason you actually can't speed up download for ftp is that ftp session have limit of exactly one data connection, so you can't download multiple files via one client connection at same time, only sequential. Also, your code will not work, because you use lazy list. If you want to try speed up your download, then you need multiple client sessions, but if server do not throttle download speed, than you have no speed up.

Upvotes: 5

digitalsentinel
digitalsentinel

Reputation: 192

For the async approach, you will want to build a list of files to download and call them concurrently. You are only calling simple_get once so there is only 1 instance of the download running. See this example as @Klas-d mentioned.

Upvotes: 0

Related Questions