lagrangian_headache
lagrangian_headache

Reputation: 149

Reading file from a ZIP archive on FTP server without downloading to local system

My target file on the FTP server is a ZIP file, and the .CSV is located two folders further in.

How would I be able to use BytesIO to allow pandas to read the csv without downloading it?

This is what I have so far:

ftp = FTP('FTP_SERVER')
ftp.login('USERNAME', 'PASSWORD')
flo = BytesIO()
ftp.retrbinary('RETR /ParentZipFolder.zip', flo.write)
flo.seek(0)

With flo as my BytesIO object of interest, how would I be able to navigate a few folders down within the object, to allow pandas to read my .csv file? Is this even necessary?

Upvotes: 5

Views: 1167

Answers (1)

Serge Ballesta
Serge Ballesta

Reputation: 148880

The zipfile module accepts file-like objects for both the archive and the individual files, so you can extract the csv file without writing the archive to the disk. And as read_csv also accepts a file-like object, all should work fine (provided you have enough available memory):

...
flo = BytesIO()
ftp.retrbinary('RETR /ParentZipFolder.zip', flo.write)
flo.seek(0)
with ZipFile(flo) as archive:
    with archive.open('foo/fee/bar.csv') as fd:
        df = pd.read_csv(fd)  # add relevant options here include encoding it is matters  

Upvotes: 6

Related Questions