Deepak Mangla
Deepak Mangla

Reputation: 106

torch.max slower with GPU than with CPU when specifying dimension

t1_h = torch.tensor(np.arange(100000), dtype=torch.float32)
cuda0 = torch.device('cuda:0')
t1_d = torch.tensor(np.arange(100000), dtype=torch.float32, device = cuda0)
%timeit -n 10000 max_h = torch.max(t1_h, 0)
%timeit -n 10000 max_d = torch.max(t1_d, 0)

10000 loops, best of 3: 144 µs per loop

10000 loops, best of 3: 985 µs per loop

As you can see above, GPU takes much more time than CPU. But if I don't specify dimension for calculating max, then GPU is faster.

%timeit -n 10000 max_h = torch.max(t1_h)
%timeit -n 10000 max_d = torch.max(t1_d)

10000 loops, best of 3: 111 µs per loop

10000 loops, best of 3: 41.8 µs per loop

I also tried with argmax instead of max but it is working correctly (GPU faster than CPU).

%timeit -n 10000 cs_h = torch.argmax(t1_h, 0)
%timeit -n 10000 cs_d = torch.argmax(t1_d, 0)

10000 loops, best of 3: 108 µs per loop

10000 loops, best of 3: 18.1 µs per loop

Is there any reason why torch.max is slow on GPU after specifying dimension?

Upvotes: 5

Views: 1278

Answers (1)

Andy Jones
Andy Jones

Reputation: 4961

I discovered this myself, and opened an issue in PyTorch. It looks like it'll be fixed soon - maybe version 1.5 or 1.6? - but in the meantime the suggested workaround is to use

ii=a.argmax(0)
maxval = a.gather(0, ii.unsqueeze(0)).squeeze(0)

Upvotes: 1

Related Questions