Reputation: 1
I am working on DDPG and created my own custom environment while i have noticed some thing which is strange that the over each episode my agent collect same amount of reward and my steps per episodes are 1000 and i collect same amount of value through out the 1000 episodes. I am sure there will be some problem problem with the agent or either i dont have enough randomness in custom environment, now how to solve it?
how to address this problem.
Upvotes: 0
Views: 161
Reputation: 36
Are the reward values you get exactly the same, or is there a very small difference between them?
If they are exactly the same, there may be a problem with how your reward function is defined. If the reward function is discrete, you might get the same reward again and again. Also, if the action is zero, then $s_t$ = $s_{t+1}$ in a static environment, so you get the same reward for any state.
If there is a slight difference between them, and the reward value you get is very low, the reason may be that your algorithm is not learning and is acting randomly. This could be because of improperly tuned hyperparameters in your DDPG or agian, an issue with how your reward function is defined. A reward function that is not designed properly can prevent the process of exploration. Also, there might be a lack of randomness in your environment.
Upvotes: 0