zergylord
zergylord

Reputation: 4446

Action constraints in actor-critic reinforcement learning

I've implemented the natural actor-critic RL algorithm on a simple grid world with four possible actions (up,down,left,right), and I've noticed that in some cases it tends to get stuck oscillating between up-down or left-right.

Now, in this domain up-down and left-right are opposites and feel that learning might be improved if I were somehow able to make the agent aware of this fact. I was thinking of simply adding a step after the action activations are calculated (e.g. subtracting the left activation from the right activation and vice versa). However, I'm afraid of this causing convergence issues in the general case.

It seems as so adding constraints would be a common desire in the field, so I was wondering if anyone knows of a standard method I should be using for this purpose. And if not, then whether my ad-hoc approach seems reasonable.

Thanks in advance!

Upvotes: 2

Views: 945

Answers (1)

danelliottster
danelliottster

Reputation: 365

I'd stay away from using heuristics in the selection of actions, if at all possible. If you want to add heuristics to your training, I'd do it in the calculation of the reward function. That way the agent will learn and embody the heuristic as a part of the value function it is approximating.

About the oscillation behavior, do you allow for the action of no movement (i.e. stay in the same location)?

Finally, I wouldn't worry too much about violating the general case and convergence guarantees. They are merely guidelines when doing applied work.

Upvotes: 2

Related Questions