Reputation: 1
I have 15 csv files, then I load all 15 csv files and concatenate all 15 csv files with below code:
[Jupyter notebook]
%%time
df1 = pd.read_csv("test1.csv")
.... for 15 times with above code.
#Combine the code
total_df = pd.concat([df1,df2,df3,df4,df5,df6,df7,df8,df9,df10,df11,df12,df13,df14,df15])
When I check my memory, its at 29GB used. I am I try to use below code to delete:
import gc
del [[df1, df2, df3, df4, df5, df6, df7, df8, df9, df10,df11,df12,df13,df14,df15]]
gc.collect()
But still wont work and still 29GB.
Is there a way to just remove the used memory and leave on the total_df memory as used? I am planning to perform another processing to total_df but the kernel just went dead when I load such a big dataframe. I hope to reduce the memory and then perform more tasks on the total_df like using PCA to reduce the dimension.
Also, the shape of each dataframe is about (900000,1024). The shape is huge and many rows, so total more than a millions of rows with dimension of above 1k columns.
Upvotes: 0
Views: 161