Reputation: 1
I currently have a zip file, that holds an underlying csv. I would like to read the file row by row without extracting the entire CSV file from the zip.
The underlying csv is simply too big to extract so I need a work around
Upvotes: 0
Views: 30
Reputation: 13152
You can stream read the zip archive and get the contents of the first row via:
import zipfile
with zipfile.ZipFile("final_analysis_data.zip") as z: # 100m compressed
with z.open("final_analysis_data.csv") as f: # 650m uncompressed
first_row = next(f).decode()
input("check memory useage now, press enter to continue")
print(first_row)
The input()
statement will just pause and allow you to verify that you are not reading the entire archive into memory. With a 100m archive of a 650m csv in this example the python process uses 6m of ram.
Note:
If you feel that this resolves your issue, you might consider closing it as duplicate of:
Read a large zipped text file line by line in python
rather than accepting an answer.
Upvotes: 2