Xeyes
Xeyes

Reputation: 599

How to reduce a neural network output when a certain action isn't performable

I'm using neural network and tensorflow to for reinforcement learning on various stuff with Q learning method, and I want to know what is the solution to reduce the outputs possibilities when a specific action corresponding to a specific output isn't realisable in the environment at a specific state.

For example, my network is learning to play a game in which 4 actions are performed. But there is a specific state in which action 1 isn't performable in the environment but my neural network Q values indicate me that action 1 is the best thing to do. What do I have to do in this situation?

(Is just chosing a random valid action the best way to counter this problem ?)

Upvotes: 3

Views: 455

Answers (1)

Afshin Oroojlooy
Afshin Oroojlooy

Reputation: 1434

You should just ignore the invalid action(s), and select the action with the highest Q-value among the valid actions. Then, in the train step, you either multiply the Q-values by the one-hot-encode of the actions, or use gather_nd API to select the right Q-value, to obtain the loss and run a single gradient update. In other words, the loss of the invalid action(s) and all other non-selected actions are assumed zero and then the gradients are updated.

In this way, the network gradually learns to increase the Q-value of the right action, since only the gradient of that action is getting updated.

I hope this answers your question.

Upvotes: 2

Related Questions