Why is PyTorch inference non deterministic even when setting `model.eval()`

Question

I have fine-tuned a PyTorch transformer model using HuggingFace, and I'm trying to do inference on a GPU. However, even after setting model.eval() I still get slightly different outputs if I run inference multiple times on the same data.

I have tried a number of things and have done some ablation analysis and found out that the only way to get deterministic output is by also setting

torch.cuda.manual_seed_all(42)

(or any other seed number).

Why is this the case? And is this normal? The model's weights are fixed, and there are no undefined or randomly initialized weights (when I load the trained model I get the All keys matched successfully message), so where is the randomness coming from if I don't set the cuda seed manually? Is this randomness to be expected?

iacob · Accepted Answer

You can use torch.use_deterministic_algorithms to force non-deterministic modules to perform deterministically, where supported e.g:

>>> a = torch.randn(100, 100, 100, device='cuda').to_sparse()
>>> b = torch.randn(100, 100, 100, device='cuda')

# Sparse-dense CUDA bmm is usually nondeterministic
>>> torch.bmm(a, b).eq(torch.bmm(a, b)).all().item()
False

>>> torch.use_deterministic_algorithms(True)

# Now torch.bmm gives the same result each time, but with reduced performance
>>> torch.bmm(a, b).eq(torch.bmm(a, b)).all().item()
True

# CUDA kthvalue has no deterministic algorithm, so it throws a runtime error
>>> torch.zeros(10000, device='cuda').kthvalue(1)
RuntimeError: kthvalue CUDA does not have a deterministic implementation...

Why is PyTorch inference non deterministic even when setting `model.eval()`

Answers (1)

Related Questions