Reputation: 71
I'm training a neural network with 100*100 hidden nodes, four inputs/one output, and batch size of 32, and I am seeing no speed improvement in using the GPU vs. CPU. I only have a limited data set (1067 samples, copied all to the GPU at the beginning), but I would have thought the 33 batches could have run in parallel, more than making up for the time in copying to the GPU. Is my data set too small, or is there potentially some other issue? Here is my code snippet:
def train_for_regression(X, T):
BATCH_SIZE = 32
n_epochs = 1000
learning_rate = 0.01
device = torch.device("cuda:0")
Xt = torch.from_numpy(X).float().to(device) #Training inputs are 4 * 1067 samples
Tt = torch.from_numpy(T).float().to(device) #Training outputs are 1 * 1067 samples
nnet = torch.nn.Sequential(torch.nn.Linear(4, 100),
torch.nn.Tanh(),
torch.nn.Linear(100, 100),
torch.nn.Tanh(),
torch.nn.Linear(100, 1))
nnet.to(device)
mse_f = torch.nn.MSELoss()
optimizer = torch.optim.Adam(nnet.parameters(), lr=learning_rate)
for epoch in range(n_epochs):
for i in range(0, len(Xt), BATCH_SIZE):
batch_Xt = Xt[i:i+BATCH_SIZE,:]
batch_Tt = Tt[i:i+BATCH_SIZE,:]
optimizer.zero_grad()
Y = nnet(batch_Xt)
mse = mse_f(Y, batch_Tt)
mse.backward()
optimizer.step()
return nnet
Upvotes: 0
Views: 977
Reputation: 11807
Chances are the time required for the data to get to the GPU negates the benefit of the GPU. In this case the size of the network seems so small that the CPU should be efficient enough and the speedup from the GPU shouldn't be that big.
Also, GPUs are usually used for matrix computations in parallel, or in this case - a single batch's data multiplied by the weights of the network. So batches shouldn't be processed in parallel unless you take extra steps, like using additional libraries and/or GPUs.
Upvotes: 1