Reputation: 101
I am making a silly benchmark between numpy and pytorch (cpu + gpu). I can't seem to understand with the GPU is so much slower. To avoid the overhead between moving arrays back and froth from cpu to gpu, the time command is only in the linalg part. Any tips?
import torch
import numpy as np
import time
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
print(f"Is CUDA supported by this system? {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
# Storing ID of current CUDA device
cuda_id = torch.cuda.current_device()
print(f"ID of current CUDA device: {torch.cuda.current_device()}")
print(f"Name of current CUDA device: {torch.cuda.get_device_name(cuda_id)}")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
size=10000
real=1
A=np.random.rand(size,size)
b=np.random.rand(size,1)
start_time = time.time()
for t in range(real):
x_np=np.linalg.solve(A,b)
print("NUMPY CPU--- %s seconds ---" % (time.time() - start_time))
A = torch.from_numpy(A)
b = torch.from_numpy(b)
start_time = time.time()
for t in range(real):
x_torch = torch.linalg.solve(A, b)
print("PYTORCH CPU--- %s seconds ---" % (time.time() - start_time))
A = A.to(device)
b = b.to(device)
start_time = time.time()
for t in range(real):
x_torch = torch.linalg.solve(A, b)
print("PYTORCH GPU--- %s seconds ---" % (time.time() - start_time))
The results are the following (I only do one iteration).
Is CUDA supported by this system? True
CUDA version: 12.1
ID of current CUDA device: 0
Name of current CUDA device: NVIDIA GeForce RTX 3060
NUMPY CPU--- 1.6754064559936523 seconds ---
PYTORCH CPU--- 1.3463587760925293 seconds ---
PYTORCH GPU--- 3.8940138816833496 seconds ---
Upvotes: 0
Views: 172
Reputation: 101
I figure it out. I converted the tensors to float32 and now it makes sense.
NUMPY CPU--- 1.626929759979248 seconds ---, float64
PYTORCH CPU--- 1.3480534553527832 seconds ---, torch.float64
PYTORCH GPU--- 4.0684168338775635 seconds ---, torch.float64
NUMPY CPU--- 2.4209649562835693 seconds ---, float32
PYTORCH CPU--- 0.6886072158813477 seconds ---, torch.float32
PYTORCH GPU--- 0.2805318832397461 seconds ---, torch.float32
Upvotes: 0