HassanSh__3571619
HassanSh__3571619

Reputation: 2077

Reinforcement learning with pair of actions

I'm learning the reinforcement learning in python and followed some training, and most of them dealing with simple actions (like up, down, right, or left), so basically one action at a time. In my project I have actions in different ways: It has a pair of actions, means an action in addition to an offset been taken within this action...like (action-type, offset-been-taken). Action types for example are: u1_set,u1_clear,u2_set,u2_clear,u3_set,u3_clear. And on each action, there is attenuation offset associated with this implemented action (offset like -1,-0.5,0,+0.5,+1), so as example of some pair of actionswill be like (u2_set, +1), (u2_clear, -0.5),...etc.

Wondering what will be the best way to implement the reinforcement learning in this situation (pair of actions and offset) and if there is a good example available online to share.

Thanks in advance.

Upvotes: 1

Views: 359

Answers (1)

Dennis Soemers
Dennis Soemers

Reputation: 8488

By far the easiest approach will be to simply treat every possible pair of "sub-actions" as a single complete action. So, in your example, every action is a pair (U, Offset), where U is one of {u1_set, u1_clear, u2_set, u2_clear, u3_est, u3_clear}, and Offset is one of {-1, -0.5, 0, +0.5, +1}. With this example, there would be a total of 6 x 5 = 30 possible pairs, so 30 different actions. That should be perfectly fine for most RL approaches.

If you move on to more complex situations (too many possible pairs), you could start considering more complex solutions as well. For example, you could treat the problem of selection an action-type as a first RL problem, and then the problem of selecting an offset as an additional, separate RL problem (possibly with an enhanced state representation that also contains the already-selected action type).

Or, if you were to move on to Reinforcement Learning with Neural Networks, you could simply have two separate "heads" as output layers, both connected to otherwise the same architecture.

I suspect those last two paragraphs may be unnecessarily complex, especially if you've only just started learning RL, and the first paragraph may be just fine.

Upvotes: 1

Related Questions