polliew
polliew

Reputation: 1

How to read first row of a zipped csv in python

I currently have a zip file, that holds an underlying csv. I would like to read the file row by row without extracting the entire CSV file from the zip.

The underlying csv is simply too big to extract so I need a work around

Upvotes: 0

Views: 30

Answers (1)

JonSG
JonSG

Reputation: 13152

You can stream read the zip archive and get the contents of the first row via:

import zipfile
with zipfile.ZipFile("final_analysis_data.zip") as z: # 100m compressed
    with z.open("final_analysis_data.csv") as f:      # 650m uncompressed
        first_row = next(f).decode()
        input("check memory useage now, press enter to continue")
print(first_row)

The input() statement will just pause and allow you to verify that you are not reading the entire archive into memory. With a 100m archive of a 650m csv in this example the python process uses 6m of ram.

Note:

If you feel that this resolves your issue, you might consider closing it as duplicate of:

Read a large zipped text file line by line in python

rather than accepting an answer.

Upvotes: 2

Related Questions