Screwtape007
Screwtape007

Reputation: 185

Read gz file in python

I am trying to read/extract the contents of the file train.gz

my code:

import gzip
with gzip.open('train.gz', 'rb') as f:
    file_content = f.read()

when i run:

print(file_content)

I get this error (on jupyter notebook):

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
/tmp/ipykernel_2392/4036593255.py in <module>
----> 1 print(file_content)

MemoryError: 

any suggestions?

Upvotes: 0

Views: 306

Answers (1)

DazWilkin
DazWilkin

Reputation: 40071

MemoryError suggests that the file is too big for your runtime to process.

IIGC train.gz may be a training model and it may be that you must deal with this model as a single chunk. If so, your best solution is to find a bigger (more memory) machine.

If at all possible (and strongly preferred), you should stream the uncompressed file through your program so that you may constrain the buffer|in-memory window onto it thereby limiting the possibility that you'll run out of memory.

Upvotes: 3

Related Questions