There's a daily updated zip file of about (~600 MB), found on an ftp site, I created a script that does the following: Connect to the data ftp site Download the zip file (600 MB) Extract it to a local folder to find one or two text files insides that zip file of my interest. My question is, do I need to daily download 600MB size of zip file to get that .txt file(s), my intention is to try saving time and money. May be a library can list the contents of the zip file and then downloads only the two text files I'm interested in?

Reputation: 21

ftp download file inside zip file using python

There's a daily updated zip file of about (~600 MB), found on an ftp site, I created a script that does the following:

Connect to the data ftp site
Download the zip file (600 MB)
Extract it to a local folder to find one or two text files insides that zip file of my interest.

My question is, do I need to daily download 600MB size of zip file to get that .txt file(s), my intention is to try saving time and money. May be a library can list the contents of the zip file and then downloads only the two text files I'm interested in?

Upvotes: 2

Answers (2)

ToxicMender

Reputation: 277

It's in python 3 but shouldn't require too many modifications for it to work in python 2.7:

Note: It's an implementation based suggestion, since the extraction process isn't handled by the FTP server as a standard operation. If it were sshFTP it would've been a different case though.

import zipfile as zf
with zf.ZipFile(filename, 'r') as zfobj:
    for file in zfobj.namelist():
        with zfobj.open(file, 'r') as fobj:
            print(fobj.read())

For simply getting files which aren't already present, ie; That is files are appended to the zip, and not modified

with zf.ZipFile(filename, 'r') as zfobj:
    if set(zfobj.namelist) <= set(os.listdir()):
        pass
    else:
        files = list(set(zfobj.namelist()) - set(os.listdir()))
        for file in files:
            zfobj.extract(file)
            with zfobj.open(file, 'r') as fobj:
                print(fobj.read())

Upvotes: 1

Steffen Ullrich

Reputation: 123320

I doubt that there is a public available library which already does this for you. Apart from that questions asking for recommending a software library are off-topic here. So I instead describe a way how you could implement such feature yourself:

FTP does not really have random access. The most you could probably do is to detect the file size using the SIZE command (if supported), set the offset near the end of the file using the REST command and then read until the end of the file using RETR. At the end of the data there is the central directory which contains a central directory header for each file which then contains the offset where each local file header is located and the size of the compressed data. Once you found out this way which files are new and where they start you could position to this offset using REST and use RETR to initiate the download. Since FTP does not have a command to read only a specific number of bytes from a file you have to use ABOR to stop the download once you've received enough data. Then you can extract the compressed data from this download and decompress it to get the file you want. For more info see ZIP file format - Structure.

Upvotes: 0

ftp download file inside zip file using python

Answers (2)

Related Questions