Reputation: 2156
I have the following PyTorch tensor:
X = np.array([[1, 3, 2, 3], [2, 3, 5, 6]])
X = torch.FloatTensor(X).cuda()
I was wondering if there is any difference (especially in the speed) between Scenario A or B below if I run the following multiple PyTorch operators in one line?
Scenario A:
X_sq_sum = (X**2).cuda().sum(dim = 1).cuda()
Scenario B:
X_sq_sum = (X**2).sum(dim = 1).cuda()
ie. Scenario A has two .cuda()
whereas Scenario B only has one .cuda()
.
Many thanks in advance.
Upvotes: 0
Views: 460
Reputation: 36
They will perform equally as the CUDA conversion is only done once.
As described in the docs, repeated .cuda()
calls will be no-ops if the object is already in CUDA memory.
Upvotes: 2