Obaida Ammar
Obaida Ammar

Reputation: 21

Stable Baselines3: TensorBoard train/ep_rew_mean Graph Decreasing Despite Agent Improvement

Sorry for the long question!

I'm working with a Reinforcement Learning custom environment using Stable Baselines3's SAC algorithm. My environment has a max_steps_per_episode of 500. If the agent doesn't reach the goal within these steps, the episode is truncated and reset.

I'm observing an unusual trend in the TensorBoard graph for train/ep_rew_mean. Initially, the curve starts at a high reward value and then decreases before converging to a lower value. However, to show that my agent is performing well this graph should be increasing then converging!

I believe the issue is that the train/ep_rew_mean graph is plotting the accumulated reward for each episode. Since my agent is learning to find more efficient solutions and reach the goal in fewer steps (this can be observed in the train/ep_len_mean graph which shows a decreasing graph then convergence, so the number of steps is minimized as the number of episodes increase), the accumulated reward decreases in later episodes even though the agent is performing better.

The left graph is for the episode_length_mean which shows good results as the number of steps/episode decrease as the total_timesteps increase. The second graph is for the episode_reward_mean I was expecting an increasing curve then a plateau but the curve is decreasing due to the mentioned reason

This is the step function:

def step(self, action):
        # Execute one time step within the environment
        terminated = False
        truncated = False    
        
        new_state, reward, self.done, info = self._take_action(action)
        
        if not self.best_episode_state:
            self.best_episode_state[reward] = new_state.get_points()
        elif reward > list(self.best_episode_state.keys())[0]:
            self.best_episode_state.clear()
            self.best_episode_state[reward] = new_state.get_points()
        
        if self.done == True:
            terminated = True
            print("**********(Terminated)**********")
           
        if self.current_step > 500 and not terminated:
            truncated = True
            print("***********(Truncated)***********")
        
        self.current_step += 1
        self.total_steps += 1
        self.state = new_state
        return np.array(new_state.get_points()), reward, terminated, truncated, info

I want the train/ep_rew_mean graph to show an increasing trend as the agent learns to maximize rewards, eventually reaching a plateau (convergence). I'd like to see a graph that reflects the agent's improvement, not the total reward accumulated in each episode.

Is there a way to modify the train/ep_rew_mean graph or my code to display the average reward per step instead of the accumulated reward per episode? This way, I can better visualize the agent's performance improvement.

Upvotes: 0

Views: 123

Answers (0)

Related Questions