Andrew Chen
Andrew Chen

Reputation: 71

Why GPU is much slower than cpu in google colab?

I'm training a RNN on google colab and this is my first time using gpu to train a neural network. From my point of view, GPU should be much faster than cpu, and changing device from cpu to gpu only need to add .to('cuda') in the definition of model/loss/variable and set google colab 'running on gpu'.

When I train it on cpu, the average speed is 650 iteration/s

Training on cpu in google colab

But when I train it on gpu, the average speed is only 340 iterations/s, only half of the cpu

Training on gpu in google colab

and this happened on every epoch

Here is my code.

def train(num_epoch = 30,len_vocab = 1, num_hidden=256,embedding_dim = 8,batch_size = 100):
    data = get_data()

    model = MyRNN(len_vocab,num_hidden,embedding_dim).to('cuda') #here 
    if os.path.exists('QingBinLi'):
        model.load_state_dict(torch.load('QingBinLi'))

    criterion = nn.MSELoss().to('cuda')   #here 
    optimizer = torch.optim.Adam(model.parameters(), lr=0.1, weight_decay=1e-5)
    loss_for_draw = []
    model.train()
    data = data.detach().to('cuda') #here 

    for epoch in range(num_epoch+1):

        h = torch.randn(1,batch_size,num_hidden).to('cuda')  #here 
        loss_average = 0
        for i in tqdm(range(data.shape[-2] -batch_size)):
            optimizer.zero_grad()
            pre,h = model(data[:,:,i:i+batch_size,:].squeeze(0) ,h)
            h = h.detach()
            pre = pre.unsqueeze(0).unsqueeze(0)
            loss = criterion(pre, data[:,:,i+1:i+1+batch_size,:].squeeze(0))
            loss_average += loss.item()
            loss.backward()
            nn.utils.clip_grad_norm_(model.parameters(), max_norm=10)
            optimizer.step()

        loss_for_draw.append(loss_average/(data.shape[-2] -batch_size))
        torch.save(model.state_dict(), 'QingBinLi')
        print(f'now epoch:{epoch}, loss = {loss_for_draw[-1]}')


    return loss_for_draw

I just add '.to('cuda')' there when I try to run it on gpu.

So, why it's much slower when I running my code on gpu? Maybe I should modify more code?

Upvotes: 2

Views: 6949

Answers (1)

Andrew Chen
Andrew Chen

Reputation: 71

My brother says that when the tensor is very big, such as 1 million dimension, gpu can be faster than cpu, otherwise we don't even need parallel computing because computing are not mainly on tensor multiply, but on copy tensors and other things like that.

My RNN has about 256x256+256x8 parameters and batch_size is 100, and the dimention of that is much lower than 1 million. So gpu is much slower.

And, when I change my batch_size to 10000, gpu is 145 iteration/s while cpu is only 15iterations/s. This time gpu is much faster.

A CNN, with stride one, in gpu we can calculate filter_size *image_size * batch_size, about 2,415,919,104 times multiply simultaneously. So in this kind of computing, gpu is much faster.

Upvotes: 3

Related Questions