Amir
Amir

Reputation: 11096

How to deal with GPU memory leakage issues in Torch?

My machine's GPU has 2 GBs of memory. When I run the following code for the first time, I get no errors. However, the second time I run the code I get memory error. As a short-time remedy, the only thing I can do is to cast the data into float32 using torch.Tensor.float(). However, the problem still persists and the occupied memory is not released after the process is done, or the process is terminated while running. This is the case for the machine RAM as well. How should one prevent memory leakage in Torch or release the memory?

require 'nn'
require 'image'
require 'cunn'
require 'paths'



collectgarbage(); collectgarbage()
if (not paths.filep("cifar10torchsmall.zip")) then
    os.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
    os.execute('unzip cifar10torchsmall.zip')
end
trainset = torch.load('cifar10-train.t7')
testset = torch.load('cifar10-test.t7')
classes = {'airplane', 'automobile', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck'}

setmetatable(trainset, 
    {__index = function(t, i) 
                    return {t.data[i], t.label[i]} 
                end}
);
trainset.data = trainset.data:double() -- convert the data from a ByteTensor to a DoubleTensor.

function trainset:size() 
    return self.data:size(1) 
end

mean = {} -- store the mean, to normalize the test set in the future
stdv  = {} -- store the standard-deviation for the future
for i=1,3 do -- over each image channel
    mean[i] = trainset.data[{ {}, {i}, {}, {}  }]:mean() -- mean estimation
    print('Channel ' .. i .. ', Mean: ' .. mean[i])
    trainset.data[{ {}, {i}, {}, {}  }]:add(-mean[i]) -- mean subtraction

    stdv[i] = trainset.data[{ {}, {i}, {}, {}  }]:std() -- std estimation
    print('Channel ' .. i .. ', Standard Deviation: ' .. stdv[i])
    trainset.data[{ {}, {i}, {}, {}  }]:div(stdv[i]) -- std scaling
end


testset.data = testset.data:double()   -- convert from Byte tensor to Double tensor
for i=1,3 do -- over each image channel
    testset.data[{ {}, {i}, {}, {}  }]:add(-mean[i]) -- mean subtraction    
    testset.data[{ {}, {i}, {}, {}  }]:div(stdv[i]) -- std scaling
end

trainset.data = trainset.data:cuda()
testset.data = testset.data:cuda()

net = nn.Sequential()
net:add(nn.SpatialConvolution(3, 6, 5, 5)) -- 3 input image channels, 6 output channels, 5x5 convolution kernel
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.SpatialMaxPooling(2,2,2,2))     -- A max-pooling operation that looks at 2x2 windows and finds the max.
net:add(nn.SpatialConvolution(6, 16, 5, 5))
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16*5*5))                    -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 16*5*5
net:add(nn.Linear(16*5*5, 120))             -- fully connected layer (matrix multiplication between input and weights)
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.Linear(120, 84))
net:add(nn.ReLU())                       -- non-linearity 
net:add(nn.Linear(84, 10))                   -- 10 is the number of outputs of the network (in this case, 10 digits)
net:add(nn.LogSoftMax())  
net = net:cuda()

criterion = nn.ClassNLLCriterion()
criterion = criterion:cuda()



pred = net:forward(trainset.data)
outputEr = criterion:forward(pred, trainset.label:cuda())
net:zeroGradParameters()
outputGrad = criterion:backward(pred, trainset.label:cuda())
collectgarbage()
inputGrad = net:backward(trainset.data, outputGrad)

Side question: Why Torch initializes the network parameters as double although GPUs are pretty slow at computing double-precision operations and there is actually no need of having 64-bit parameter values for almost all neural network applications? How can I initialize a model with float (32-bit) parameter vectors?

I found the answer to the side question. You can easily make torch's default data type as float using the following at the beginning of your code:

torch.setdefaulttensortype('torch.FloatTensor')

Upvotes: 4

Views: 1684

Answers (1)

Amir
Amir

Reputation: 11096

I could resolve the issue (almost) by upgrading from CUDA 6.5 to CUDA 7.5 on the machine which I was doing the above experiments. Now, for most of the time when the program crashes while running the GPU memory gets released. However, still sometimes it does not happen and I have to restart the machine.

Also, I would do the followings in order to make sure the program clears the GPU memory when the program successfully runs:

net = nil
trainset = nil
testset = nil
pred = nil
inputGrad = nil
criterion = nil

collectgarbage()

Upvotes: 2

Related Questions