Muhammad Arslan
Muhammad Arslan

Reputation: 155

Unable to allocate GPU memory, when there is enough of cached memory

I am training vgg16 model from scratch on AWS EC2 Deep Learning AMI machine (Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-1054-aws x86_64v)) with Python3 (CUDA 10.1 and Intel MKL) (Pytorch 1.3.1) and facing below error while updating model parameters.

RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 11.17 GiB total capacity; 10.76 GiB already allocated; 4.81 MiB free; 119.92 MiB cached)

Code for updating parameters:

def _update_fisher_params(self, current_ds, batch_size, num_batch):
    dl = DataLoader(current_ds, batch_size, shuffle=True)
    log_liklihoods = []
    for i, (input, target) in enumerate(dl):
        if i > num_batch:
            break
        output = F.log_softmax(self.model(input.cuda().float()), dim=1)
        log_liklihoods.append(output[:, target])
    log_likelihood = torch.cat(log_liklihoods).mean()
    grad_log_liklihood = autograd.grad(log_likelihood, self.model.parameters())
    _buff_param_names = [param[0].replace('.', '__') for param in self.model.named_parameters()]
    for _buff_param_name, param in zip(_buff_param_names, grad_log_liklihood):
        self.model.register_buffer(_buff_param_name+'_estimated_fisher', param.data.clone() ** 2)

After debugging: log_liklihoods.append(output[:, target]) line throws error after 157 iterations

I have the required memory but it does not allocate, I am not getting why updating the gradients is causing the memory problem, as gradients should be de-referenced and released automatically on each iteration. Any idea?

I have tried following solutions but no luck.

Machine Specs:

enter image description here

Upvotes: 3

Views: 666

Answers (1)

Muhammad Arslan
Muhammad Arslan

Reputation: 155

Finally I solved the memory problem! I realized that in each iteration I put the input data in a new tensor, and pytorch generates a new computation graph. That causes the used RAM to grow forever. Then I used .detach() function, and the RAM always stays at a low level.

self.model(input.cuda().float()).detach().requires_grad_(True)

Upvotes: 2

Related Questions