ta ling
ta ling

Reputation: 1

Memory usage is at 29GB when loading a big dataframe

I have 15 csv files, then I load all 15 csv files and concatenate all 15 csv files with below code:

[Jupyter notebook]

%%time
df1 = pd.read_csv("test1.csv")
.... for 15 times with above code.

#Combine the code
total_df = pd.concat([df1,df2,df3,df4,df5,df6,df7,df8,df9,df10,df11,df12,df13,df14,df15])

When I check my memory, its at 29GB used. I am I try to use below code to delete:

import gc
del [[df1, df2, df3, df4, df5, df6, df7, df8, df9, df10,df11,df12,df13,df14,df15]]
gc.collect()

But still wont work and still 29GB.

Is there a way to just remove the used memory and leave on the total_df memory as used? I am planning to perform another processing to total_df but the kernel just went dead when I load such a big dataframe. I hope to reduce the memory and then perform more tasks on the total_df like using PCA to reduce the dimension.

Also, the shape of each dataframe is about (900000,1024). The shape is huge and many rows, so total more than a millions of rows with dimension of above 1k columns.

Upvotes: 0

Views: 161

Answers (1)

mozway
mozway

Reputation: 260335

You might want to try loading the files directly in concat:

files = ['test1.csv', 'test 2.csv']

total_df = pd.concat(pd.read_csv(f) for f in files)

Upvotes: 1

Related Questions