DDQN:My model has already converged, but then I stopped learning and tested for 100 rounds, and it immediately collapsed. Why is that?

Question

I ran a maze exploration model using DDQN, and it performed very well during training, so I wanted to save my model. However, when I reloaded the model, it seemed as if it had never been trained. I began to suspect that I might have saved it incorrectly. So, I stopped saving the model and set up 500 rounds for training. My model achieved good convergence before the 400th round. So, after the 400th round, I stopped calling the learn() method and conducted 100 rounds of testing. At this point, the network parameters should no longer be updated and should remain as they were in the 400th round. I expected to get good results, but things didn't go as planned. Here is my result chart below.

def train_maze():
    step_counter = 0
    global episode
    saver = tf.train.Saver(max_to_keep=1)  # 创建一个Saver对象，用于保存模型
    for episode in range(450):
        RL.time = episode
        total_reward = 0
        observation = env.reset()
        while True:
            action = RL.choose_action(observation)
            observation_, reward, done = env.step(action)
            total_reward += reward
            RL.store_transition(observation, action, reward, observation_)

            
            if ( 3000 < step_counter <= RL.memory_size)  and (step_counter % 500 == 0):
                RL.learn() 
            elif(step_counter > RL.memory_size) and (episode <= 400) :
                RL.learn()      
          
            observation = observation_
            step_counter += 1
            if done: 
                if episode <= 400:
                    RL._discount_and_norm_rewards()
                    if env.is_success:
                        # RL.store_fine_transition(RL.episode_memory)
                        if len(RL.fine_memory) > RL.batch_size : 
                            for i in range (4):
                                RL.learn_fine()
                    else:
                        if len(RL.fine_memory) > RL.batch_size : 
                            for i in range (400):
                                RL.learn_fine()
                RL.episode_memory.clear()
                RL.ep_rs.clear()
                ep_total_reward.append(total_reward)
                ep_total_step.append(env.ep_step)
                logger.info("episode: {}, total_reward: {}, episode_step: {}".format(episode,   int(total_reward), env.ep_step))
                 break

am quite certain that I only added one more round on the basis of 400 rounds, which means I only changed the value of the episode from 400 to 500, and I am sure that my initial environment has always been the same.

DDQN:My model has already converged, but then I stopped learning and tested for 100 rounds, and it immediately collapsed. Why is that?

Answers (0)

Related Questions