Totes McGoats
Totes McGoats

Reputation: 121

How to open and read LZMA file in-memory

I have a giant file, let's call it one-csv-file.xz. It is an XZ-compressed CSV file.

How can I open and parse through the file without first decompressing it to disk? What if the file is, for example, 100 GB? Python cannot read all of that into memory at once, of course. Will it page or run out of memory?

Upvotes: 9

Views: 10883

Answers (2)

MRocklin
MRocklin

Reputation: 57281

You can iterate through an LZMAFile object

import lzma  # python 3, try lzmaffi in python 2
with open('one-csv-file.xz') as compressed:
    with lzma.LZMAFile(compressed) as uncompressed:
        for line in uncompressed:
            do_stuff_with(line)

Upvotes: 8

Mark Adler
Mark Adler

Reputation: 112394

You can decompress incrementally. See Compression using the LZMA Algorithm. You create an LZMADecompressor object, and then use the decompress method with successive chunks of the compressed data to get successive chunks of the uncompressed data.

Upvotes: 3

Related Questions