Mari
Mari

Reputation: 698

Memory Error occurs while working with large dataset

I have the interpolated data of 3 numpy arrays.

Each of Length - 107952899

Problem Facing

When i combine these three numpy array as pandas df, I am getting MemoryError.

Reason for converting to df

I have to do some calculations, pandas make it more easier, so i preferred doing with pandas. I believe that memory size of three numpy array crosses 3 Gb and more.

System Details:

8Gb RAM python 3.6.3

Requirement

I understand the reason for such a Error But Is there any possibility to avoid MemoryError, or some other best practice to be followed ??

Upvotes: 1

Views: 897

Answers (1)

Adrien Pacifico
Adrien Pacifico

Reputation: 1939

When i combine these three numpy array as pandas df, I am getting MemoryError.

Let's say that you do:

import numpy as np
import pandas as pd

big_array_1 = np.array(np.random.random(10**7))
big_array_2 = np.array(np.random.random(10**7))
big_array_3 = np.array(np.random.random(10**7))

On my computer, it takes around 300 MB of memory.

Then if I do:

df = pd.DataFrame([big_array_1,big_array_2, big_array_3])

The memory soars up to 9Gb of ram. If you multiply it by a factor 10 (to get your 3 Gb of data instead of my 300), you will go up to 90 Gb which is probably more then your Ram + available swap, which would raise a MemoryError.

But if instead, you do:

df = pd.DataFrame({"A":big_array_1, "B": big_array_2, "C":big_array_3})

then your usage of memory will not be significantly bigger than the one of your three arrays.

I suspect that it is your issue...

Upvotes: 2

Related Questions