Reputation: 51
I am training a neural network (feedforward, Tanh hidden layers) that receives states as inputs and gives actions as outputs. I am following the REINFORCE algorithm for policy-gradient reinforcement learning.
However, I need my control actions to be bounded (let us say from 0-5). Currently the way I am doing this is by using a sigmoid output function and multiplying the output by 5. Although my algorithm has a moderate performance, I find the following drawback from using this “bounding scheme” for the output:
I know for regression (hence I guess for reinforcement learning) a linear output is best, and although the sigmoid has a linear part I am afraid the network has not been able to capture this linear output behaviour correctly, or it captures it way too slowly (as its best performance is for classification, therefore polarizing the output).
I am wondering what other alternatives there are, and maybe some heuristics on the matter.
Upvotes: 2
Views: 1643
Reputation: 114816
Have you considered using nn.ReLU6()
? This is a bounded version of the rectified linear unit, which output is defined as
out = min( max(x, 0), 6)
Upvotes: 2