LearnerBegineer
LearnerBegineer

Reputation: 23

MemoryError: Unable to allocate 11.0 GiB for an array with shape (120, 12300000) and data type object

I am trying to read a 40GB file in pandas and perform some operations on it. I am using chunk, but getting MemoryError. (RAM of System = 32 GB)

Code

df = pd.DataFrame()
for chunk in pd.read_csv('file.csv',low_memory = False, chunksize = 50000):
    df = df.append(chunk)

How should my code be in order to read the large file ?

Upvotes: 2

Views: 1676

Answers (1)

Troy D
Troy D

Reputation: 2245

"You can't have a DataFrame larger than your machine's RAM."

https://tomaugspurger.github.io/modern-8-scaling.html

If you're reading 40GB file into 32GB of RAM, I don't think that'll work. Can you perform your operations on the chunks themselves and save it instead of doing it on the entire dataset at once?

BTW, if you're build a DataFrame from chunks, rather than appending each chunk to the same DataFrame in each iteration, it'll be faster to collect them in a list and then concat them at the end. Otherwise, Pandas has to create a new massive dataframe at each iteration.

dfs = []
for chunk in pd.read_csv('file.csv',low_memory = False, chunksize = 50000):
    dfs.append(chunk)
df = pd.concat(dfs)

Upvotes: 3

Related Questions