Reputation: 145
I have previously trained a VGG mode(say model1), and a two layer model(say model2) separately, now I have to train a new model which combines those two models together, and each part of the new model is initialized with the learned weights of model1 and model2, which I implemented as follows:
class TransferModel(nn.Module):
def __init__(self, VGG, TwoLayer):
super(TransferModel, self).__init__()
self.vgg_layer=VGG
self.linear = TwoLayer
for param in self.vgg_layer.parameters():
param.requires_grad = True
def forward(self, x):
h1_vgg = self.vgg_layer(x)
y_pred = self.linear(h1_vgg)
return y_pred
# for image_id in train_ids[0:1]:
# img = load_image(train_id_to_file[image_id])
new_model=TransferModel(trained_vgg_instance, trained_twolayer_instance)
new_model.linear.load_state_dict(trained_twolayer_instance.state_dict())
new_model.vgg_layer.load_state_dict(trained_vgg_instance.state_dict())
new_model.cuda()
And when training, I try:
def train(model, learning_rate=0.001, batch_size=50, epochs=2):
optimizer=optim.Adam(model.parameters(), lr=learning_rate)
criterion = torch.nn.MultiLabelSoftMarginLoss()
x = torch.zeros([batch_size, 3, img_size, img_size])
y_true = torch.zeros([batch_size, 4096])
for epoch in range(epochs): # loop over the dataset multiple times
running_loss = 0.0
shuffled_indcs=torch.randperm(20000)
for i in range(20000):
for batch_num in range(int(20000/batch_size)):
optimizer.zero_grad()
for j in range(batch_size):
# ... some code to load batches of images into x....
x_batch=Variable(x).cuda()
print(batch_num)
y_true_batch=Variable(train_labels[batch_num*batch_size:(batch_num+1)*batch_size, :]).cuda()
y_pred =model(x_batch)
loss = criterion(y_pred, y_true_batch)
loss.backward()
optimizer.step()
running_loss += loss
del x_batch, y_true_batch, y_pred
torch.cuda.empty_cache()
print("in epoch[%d] = %.8f " % (epoch, running_loss /(batch_num+1)))
running_loss = 0.0
print('Finished Training')
train(new_model)
In the second iteration(batch_num=1) of the first epoch, I get this error:
CUDA out of memory. Tried to allocate 153.12 MiB (GPU 0; 5.93 GiB total capacity; 4.83 GiB already allocated; 66.94 MiB free; 374.12 MiB cached)
Although I have explicitly used 'del' in my training, by running nvidia-smi it looks like it doesn't do anything and the memory isn't being freed.
What should I do?
Upvotes: 0
Views: 1712
Reputation: 1515
Change this line:
running_loss += loss
to this:
running_loss += loss.item()
By adding loss
to running_loss
, you are telling pytorch to keep all the gradients with respect to loss
for that batch in memory, even when you start training on the next batch. Pytorch thinks that maybe you will want to use running_loss
in some big loss function over multiple batches later, and therefore keeps all the gradients (and therefore activations) for all batches in memory.
By adding .item()
you just get the loss as a python float
, rather than a torch.FloatTensor
. This float is detached from the pytorch graph and thus pytorch knows you don't want gradients with respect to it.
If you are running an older version of pytorch without .item()
, you can try:
running_loss += float(loss).cpu().detach
This could also be caused by a similar bug in a test()
loop, if you have one.
Upvotes: 1