Reputation: 11
I ran a maze exploration model using DDQN, and it performed very well during training, so I wanted to save my model. However, when I reloaded the model, it seemed as if it had never been trained. I began to suspect that I might have saved it incorrectly. So, I stopped saving the model and set up 500 rounds for training. My model achieved good convergence before the 400th round. So, after the 400th round, I stopped calling the learn() method and conducted 100 rounds of testing. At this point, the network parameters should no longer be updated and should remain as they were in the 400th round. I expected to get good results, but things didn't go as planned. Here is my result chart below.
def train_maze():
step_counter = 0
global episode
saver = tf.train.Saver(max_to_keep=1) # 创建一个Saver对象,用于保存模型
for episode in range(450):
RL.time = episode
total_reward = 0
observation = env.reset()
while True:
action = RL.choose_action(observation)
observation_, reward, done = env.step(action)
total_reward += reward
RL.store_transition(observation, action, reward, observation_)
if ( 3000 < step_counter <= RL.memory_size) and (step_counter % 500 == 0):
RL.learn()
elif(step_counter > RL.memory_size) and (episode <= 400) :
RL.learn()
observation = observation_
step_counter += 1
if done:
if episode <= 400:
RL._discount_and_norm_rewards()
if env.is_success:
# RL.store_fine_transition(RL.episode_memory)
if len(RL.fine_memory) > RL.batch_size :
for i in range (4):
RL.learn_fine()
else:
if len(RL.fine_memory) > RL.batch_size :
for i in range (400):
RL.learn_fine()
RL.episode_memory.clear()
RL.ep_rs.clear()
ep_total_reward.append(total_reward)
ep_total_step.append(env.ep_step)
logger.info("episode: {}, total_reward: {}, episode_step: {}".format(episode, int(total_reward), env.ep_step))
break
am quite certain that I only added one more round on the basis of 400 rounds, which means I only changed the value of the episode from 400 to 500, and I am sure that my initial environment has always been the same.
Upvotes: 0
Views: 18