user572780
user572780

Reputation: 379

How to free up RAM when using Juypter Notebook?

I have a Juypter Notebook where I am working with large matrices (20000x20000). I am running multiple iterations, but I am getting an error saying that I do not have enough RAM after every iteration. If I restart the kernel, I can run the next iteration, so perhaps the Juypter Notebook is running out of RAM because it stores the variables (which aren't needed for the next iteration). Is there a way to free up RAM?

Edit: I don't know if the bold segment is correct. In any case, I am looking to free up RAM, any suggestions are welcome.

## Outputs:
two_moons_n_of_samples = [int(_) for _ in np.repeat(20000, 10)]
for i in range(len(two_moons_n_of_samples)):
    # print(f'n: {two_moons_n_of_samples[i]}')

    ## Generate the data and the graph
    X, ground_truth, fid = synthetic_data({'type': 'two_moons', 'n': two_moons_n_of_samples[i], 'fidelity': 60, 'sigma': 0.18})
    N = X.shape[0]
    dist_mat = sqdist(X.T, X.T)
    opt = {
        'graph': 'full',
        'tau': 0.004,
        'type': 's'
        }
    LS = dense_laplacian(dist_mat, opt)

    ## Eigenvalues and eigenvectors
    tic = time.time() ## Time how long to calculate eigenvalues/eigenvectors
    V, E = np.linalg.eigh(LS)
    idx = np.argsort(V)
    V, E = V[idx], E[:, idx]
    V = V / V.max()
    decomposition_time = time.time() - tic

    ## Initialize u0
    u0 = np.zeros(N)
    for j in range(len(fid[0])):
        u0[fid[0][j]] = 1
    for j in range(len(fid[1])):
        u0[fid[1][j]] = -1

    ## Initialize parameters
    dt = 0.05
    gamma = 0.07
    max_iter = 100

    ## Run MAP estimation
    tic = time.time()
    u_eg, _ = probit_optimization_eig(E, V, u0, dt, gamma, fid, max_iter)
    eg_time = time.time() - tic

    ## Run MAP estimation with CG
    tic2 = time.time()
    u_cg, _ = probit_optimization_cg(LS, u0, dt, gamma, fid, max_iter)
    cg_time = time.time() - tic2

    ## Write to file:
    with open('results2_two_moons_egvscg.txt', 'a') as f:
        f.write(f'{i},{two_moons_n_of_samples[i]},{decomposition_time + eg_time},{cg_time}\n')

Error:

MemoryError: Unable to allocate 1.07 GiB for an array with shape (12000, 12000) and data type float64
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
~\AppData\Local\Temp\2/ipykernel_2344/941022539.py in <module>
     11         'type': 's'
     12         }
---> 13     LS = dense_laplacian(dist_mat, opt)
     14 
     15     ## Eigenvalues and eigenvectors

C:/Users/\util\graph\dense_laplacian.py in dense_laplacian(dist_mat, opt)
     69         D_inv_sqrt = 1.0 / np.sqrt(D)
     70         D_inv_sqrt = np.diag(D_inv_sqrt)
---> 71         L = np.eye(W.shape[0]) - D_inv_sqrt @ W @ D_inv_sqrt
     72         # L = 0.5 * (L + L.T)
     73     if opt['type'] == 'rw':

MemoryError: Unable to allocate 1.07 GiB for an array with shape (12000, 12000) and data type float64

enter image description here

Upvotes: 5

Views: 12904

Answers (2)

dsgou
dsgou

Reputation: 169

I would suggest adding more swap space which is really easy and will probably save you more time and headache than redesigning the code to be less wasteful or trying to delete and garbage collect unnecessary objects. It would of course be slower than using ram memory since it will use the disk to simulate the extra memory needed. Excellent answer on how to do this on ubuntu, link

Upvotes: 0

cshelly
cshelly

Reputation: 615

I faced the same problem, the way I solved it was -

  1. Writing Functions wherever preprocessing is required and returning only preprocessed variables.
  2. Deleting used huge variables just use del x
  3. Clearing Garbage
import gc 
gc.collect()
  1. Sometimes clearing garbage doesn't helps and i used to clear the cache as well by using
import ctypes
libc = ctypes.CDLL("libc.so.6") # clearing cache 
libc.malloc_trim(0)
  1. I tried to batch my code as far as possible.

I think the best solution for you would be to batch the matrix multiplication. Libraries like TensorFlow and PyTorch does it by default, not sure about NumPy though. Check - https://www.tensorflow.org/api_docs/python/tf/linalg/matmul ( An API for matrix multiplication in batches ). Most of modern-day GPU calculations are possible due to batching !

Upvotes: 10

Related Questions