pmdaly
pmdaly

Reputation: 1202

CUDA out of memory with matrix multiply

I'm trying to multiply 3 matrices, but am running out of CUDA memory.

# A: 3000 x 100 (~2MB)
# B: 100 x 100  (~0.05MB)
# C: 100 x 3MM  (~2GB)

A = np.random.randn(3000, 100)
B = np.random.randn(100, 100)
C = np.random.randn(100, 3e6)

A_gpu = torch.from_numpy(A).cuda()
B_gpu = torch.from_numpy(B).cuda()
C_gpu = torch.from_numpy(C).cuda()

R_gpu = (A_gpu @ B_gpu @ C_gpu)

Cuda is requesting about 90GB of memory for this operation. I don't understand why.

Upvotes: 0

Views: 979

Answers (1)

Shai
Shai

Reputation: 114786

Multiplying matrices, your output size is going to be 3,000 x 3,000,000 matrix! so despite A and B being relatively small, the output R is HUGE: 9G elements. Moreover, I suspect dtype of your matrices is float64 and not float32 (because you used numpy to init them). Therefore, each of the 9G elements of R_gpu requires 8 bytes; bringing you to size of at least 72GB GPU memory only for R_gpu. I suspect intermediate results and some other stuff occupies a little more of you GPU memory.

Upvotes: 1

Related Questions