Reputation: 890
I have fine-tuned a PyTorch transformer model using HuggingFace, and I'm trying to do inference on a GPU. However, even after setting model.eval()
I still get slightly different outputs if I run inference multiple times on the same data.
I have tried a number of things and have done some ablation analysis and found out that the only way to get deterministic output is by also setting
torch.cuda.manual_seed_all(42)
(or any other seed number).
Why is this the case? And is this normal? The model's weights are fixed, and there are no undefined or randomly initialized weights (when I load the trained model I get the All keys matched successfully
message), so where is the randomness coming from if I don't set the cuda seed manually? Is this randomness to be expected?
Upvotes: 5
Views: 2773
Reputation: 24201
You can use torch.use_deterministic_algorithms
to force non-deterministic modules to perform deterministically, where supported e.g:
>>> a = torch.randn(100, 100, 100, device='cuda').to_sparse()
>>> b = torch.randn(100, 100, 100, device='cuda')
# Sparse-dense CUDA bmm is usually nondeterministic
>>> torch.bmm(a, b).eq(torch.bmm(a, b)).all().item()
False
>>> torch.use_deterministic_algorithms(True)
# Now torch.bmm gives the same result each time, but with reduced performance
>>> torch.bmm(a, b).eq(torch.bmm(a, b)).all().item()
True
# CUDA kthvalue has no deterministic algorithm, so it throws a runtime error
>>> torch.zeros(10000, device='cuda').kthvalue(1)
RuntimeError: kthvalue CUDA does not have a deterministic implementation...
Upvotes: 4