torch.max slower with GPU than with CPU when specifying dimension

Question

t1_h = torch.tensor(np.arange(100000), dtype=torch.float32)
cuda0 = torch.device('cuda:0')
t1_d = torch.tensor(np.arange(100000), dtype=torch.float32, device = cuda0)

%timeit -n 10000 max_h = torch.max(t1_h, 0)
%timeit -n 10000 max_d = torch.max(t1_d, 0)

10000 loops, best of 3: 144 µs per loop

10000 loops, best of 3: 985 µs per loop

As you can see above, GPU takes much more time than CPU. But if I don't specify dimension for calculating max, then GPU is faster.

%timeit -n 10000 max_h = torch.max(t1_h)
%timeit -n 10000 max_d = torch.max(t1_d)

10000 loops, best of 3: 111 µs per loop

10000 loops, best of 3: 41.8 µs per loop

I also tried with argmax instead of max but it is working correctly (GPU faster than CPU).

%timeit -n 10000 cs_h = torch.argmax(t1_h, 0)
%timeit -n 10000 cs_d = torch.argmax(t1_d, 0)

10000 loops, best of 3: 108 µs per loop

10000 loops, best of 3: 18.1 µs per loop

Is there any reason why torch.max is slow on GPU after specifying dimension?

torch.max slower with GPU than with CPU when specifying dimension

Answers (1)

Related Questions