Stable Baselines3: TensorBoard train/ep_rew_mean Graph Decreasing Despite Agent Improvement

Question

Sorry for the long question!

I'm working with a Reinforcement Learning custom environment using Stable Baselines3's SAC algorithm. My environment has a max_steps_per_episode of 500. If the agent doesn't reach the goal within these steps, the episode is truncated and reset.

I'm observing an unusual trend in the TensorBoard graph for train/ep_rew_mean. Initially, the curve starts at a high reward value and then decreases before converging to a lower value. However, to show that my agent is performing well this graph should be increasing then converging!

I believe the issue is that the train/ep_rew_mean graph is plotting the accumulated reward for each episode. Since my agent is learning to find more efficient solutions and reach the goal in fewer steps (this can be observed in the train/ep_len_mean graph which shows a decreasing graph then convergence, so the number of steps is minimized as the number of episodes increase), the accumulated reward decreases in later episodes even though the agent is performing better.

This is the step function:

def step(self, action):
        # Execute one time step within the environment
        terminated = False
        truncated = False    
        
        new_state, reward, self.done, info = self._take_action(action)
        
        if not self.best_episode_state:
            self.best_episode_state[reward] = new_state.get_points()
        elif reward > list(self.best_episode_state.keys())[0]:
            self.best_episode_state.clear()
            self.best_episode_state[reward] = new_state.get_points()
        
        if self.done == True:
            terminated = True
            print("**********(Terminated)**********")
           
        if self.current_step > 500 and not terminated:
            truncated = True
            print("***********(Truncated)***********")
        
        self.current_step += 1
        self.total_steps += 1
        self.state = new_state
        return np.array(new_state.get_points()), reward, terminated, truncated, info

I want the train/ep_rew_mean graph to show an increasing trend as the agent learns to maximize rewards, eventually reaching a plateau (convergence). I'd like to see a graph that reflects the agent's improvement, not the total reward accumulated in each episode.

Is there a way to modify the train/ep_rew_mean graph or my code to display the average reward per step instead of the accumulated reward per episode? This way, I can better visualize the agent's performance improvement.

Stable Baselines3: TensorBoard train/ep_rew_mean Graph Decreasing Despite Agent Improvement

Answers (0)

Related Questions