user19562955
user19562955

Reputation: 293

How can I solve the memory problem that appears when you read the dataset?

Regardless what you set low_memory , true or false I get the memory error. Unable to allocate 13.5 GiB for an array with shape (4357, 415796) and data type float64

Upvotes: 0

Views: 41

Answers (1)

BeRT2me
BeRT2me

Reputation: 13242

low_memory=True only helps reduce memory usage while parsing, it won't help at all with the total size of the file.

To process a file this large you'll need to work on it in chunks.

If some of your calculations need the whole file at once, you'll need to look into other options such as pyspark or dask.

# IIUC, this should approx. be chunks of ~1.2GB:
with pd.read_csv('file.csv', chunksize=400) as reader:
    for chunk in reader:
        # Do stuff with each chunk.

Upvotes: 0

Related Questions