Reputation: 1383
I want to use ReLU1 non-linear activation. ReLU1 is linear in [0,1] but clamps values less than 0 to 0 and clamps values more than 1 to 1.
It will be used only for the last layer of my deep net in PyTorch having a really high definition output of 2048x4096
. Since the code has to be highly optimized in terms of speed and memory I do not know which of the following will be the best implementation.
Following are the two implementations I can think of for the tensor x
:-
x.clamp_(min=0.0, max=1.0)
For this I am unable to see the source code given in its docs. So do not know if its the best choice. I will prefer in place operation since backpropagation can happen through it.
The second alternative I have is to use torch.nn.functional.hardtanh_(x, min_val=0.0, max_val=1.0)
. This is definitely a in place function and the source code says that it uses the C++ file torch._C._nn.hardtanh(input, min_val, max_val)
so I think it will be fast.
Please suggest which is the most efficient implementation and another one if possible.
Thankyou
Upvotes: 0
Views: 932
Reputation: 360
Without trying it, my guess is that clamp
and hardtanh
will have the same speed, and it will be hard to do this operation any faster if you optimize it in isolation. The arithmetic is trivial so this operation will be bottlenecked by GPU memory bandwidth. To run faster, you'd want to fuse this operation with the operation that produced x
. If you don't want to write a custom kernel for the combined operation, you can try using TorchScript.
Upvotes: 1