Reputation: 149
My target file on the FTP server is a ZIP file, and the .CSV is located two folders further in.
How would I be able to use BytesIO to allow pandas to read the csv without downloading it?
This is what I have so far:
ftp = FTP('FTP_SERVER')
ftp.login('USERNAME', 'PASSWORD')
flo = BytesIO()
ftp.retrbinary('RETR /ParentZipFolder.zip', flo.write)
flo.seek(0)
With flo
as my BytesIO object of interest, how would I be able to navigate a few folders down within the object, to allow pandas to read my .csv file? Is this even necessary?
Upvotes: 5
Views: 1167
Reputation: 148880
The zipfile
module accepts file-like objects for both the archive and the individual files, so you can extract the csv file without writing the archive to the disk. And as read_csv
also accepts a file-like object, all should work fine (provided you have enough available memory):
...
flo = BytesIO()
ftp.retrbinary('RETR /ParentZipFolder.zip', flo.write)
flo.seek(0)
with ZipFile(flo) as archive:
with archive.open('foo/fee/bar.csv') as fd:
df = pd.read_csv(fd) # add relevant options here include encoding it is matters
Upvotes: 6