Reputation: 1844
I'm using the multiprocessing package in pytorch to split the training across multiple processes. My x and y, train and test data are CUDA tensors. I'm trying to understand the difference between using the tensor.share_memory_() and the multiprocessing.Queue method to share cuda tensors. Which is preferred and why?
Here's my current code using tensor.share_memory_(). What changes should I make?
def train(model, features, target, epochs=1000):
X_train, x_test, Y_train, y_test = train_test_split(features,
target,
test_size=0.4,
random_state=0)
Xtrain_ = torch.from_numpy(X_train.values).float().share_memory_()
Xtest_ = torch.from_numpy(x_test.values).float().share_memory_()
Ytrain_ = (torch.from_numpy(Y_train.values).view(1,-1)[0]).share_memory_()
Ytest_ = (torch.from_numpy(y_test.values).view(1,-1)[0]).share_memory_()
optimizer = optim.Adam(model.parameters(), lr = 0.01)
loss_fn = nn.NLLLoss()
for epoch in range(epochs):
#training code here
target method ends here
mp.set_start_method('spawn')
model = Net()
model.share_memory()
processes = []
for rank in range(1):
p = mp.Process(target=train, args=(model, features, target))
p.start()
processes.append(p)
Env details: Python-3 and Linux
Upvotes: 2
Views: 4509
Reputation: 8428
They are the same. torch.multiprocessing.Queue
uses tensor.share_memory_()
internally.
Upvotes: 4