Reputation: 277
I am dealing with pytorch in colab
While training, pytorch consumes enormous memory
after training, I saved model, and loaded model to another notebook(note 2).
in note 2, after loading state_dict and everything, pytorch consumes way less memory than in training state.
So, I wonder 'useless' data is stored in graphic card memory while training(in my case, about 13gb)...
If so, how do I delete useless data after training?
plus. I tried to delete variables used while training, but wasn't big enough(about 2gb)
Upvotes: 1
Views: 138
Reputation: 3272
This is to be expected while training. During the training process, the operations themselves will take up memory.
For example, consider the following operation -
a = np.random.rand(100, 500, 300)
b = np.random.rand(200, 500, 300)
c = (a[:, None, :, :] * b[None, :, :, :]).sum(-1).sum(-1)
The memory size of a, b and c individually is around 400 MB. However, if you check
%memit (a[:, None, :, :] * b[None, :, :, :]).sum(-1).sum(-1)
That's 23 GB! The line itself takes up a lot of memory to actually do the operation because there are massive intermediate arrays involved. These arrays are temporary and are automatically deleted after the operation is over. So you deleting some variables isn't going to do much for reducing the footprint.
The way to get around this is to use memory optimized operations.
For example, doing np.tensordot(a, b, ((1, 2), (1, 2)))
instead of multiplying by broadcasting leaves a much better memory footprint.
So what you need to do is to identify which operation in your code is requiring such a huge memory and see if you can replace that with a more memory efficient equivalent (which might not even be possible depending on your specific use-case).
Upvotes: 1