Reputation: 11
I've been trying to run a certain cell in Google Colab for a while now and keep running into the same issue. The cell runs for about 20-25 mins and terminates the code and restarts the runtime due to running out of memory/RAM, which causes all variables to be lost. I first deleted variables that would be re-initialized in the next iteration by calling "del". After deleting the variable I called the gc.collect() function. Once that didn't work, I noticed that there were some data structures that increased every iteration (a couple of lists). I removed the lists and wrote the information to a csv file instead. I then read in the information/csv file after the for loop and obtained the information that way, instead of appending to a list every iteration in the for loop. However, that didn't solve the issue either. I do not have Colab Pro+, I am utilizing the free version.
Any assistance would be greatly appreciated. Thanks!
Upvotes: 1
Views: 6503
Reputation: 33
You might want to take a look at HDF5 datasets or numpy.memmap. If you don't have to process the whole data at once, these tools can help you process the data in batches without loading the whole data into the ram at once.
Creating and writing to an hdf5 file:
import h5py
import numpy as np
rows = 20
columns = 100
with h5py.File("mytestfile.hdf5", "w") as f:
# create new dataset
dset = f.create_dataset("mydataset", (rows,columns), dtype='i')
for i in range(rows):
# write in batches
dset[i] = np.random.randint(0,10,(columns))
Reading from file:
with h5py.File("mytestfile.hdf5", "r") as f:
dset = f["mydataset"]
for i in range(dset.shape[0]):
# read in batches
row = dset[i]
# ...
Upvotes: 1
Reputation: 33275
I first deleted variables that would be re-initialized in the next iteration by calling "del"
If that variable is quickly reassigned to a new value, deleting it won't do anything.
I then read in the information/csv file after the for loop and obtained the information that way, instead of appending to a list every iteration in the for loop
If the end result is the same amount of information stored in variables, then this won't do anything either.
Without seeing your actual code, all I can say is "your variables are too big".
Upvotes: 1