Reputation: 4115
I'm training a modified InceptionV3 model with the multi_gpu_model
in Keras, and I use model.save
to save the whole model.
Then I closed and restarted the IDE and used load_model
to reinstantiate the model.
The problem is that I am not able to resume the training exactly where I left off.
Here is the code:
parallel_model = multi_gpu_model(model, gpus=2)
parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)
model.save('my_model.h5')
Before the IDE closed, the loss is around 0.8.
After restarting the IDE, reloading the model and re-running the above code, the loss became 1.5.
But, according to the Keras FAQ, model_save
should save the whole model (architecture + weights + optimizer state), and load_model
should return a compiled model that is identical to the previous one.
So I don't understand why the loss becomes larger after resuming the training.
EDIT: If I don't use the multi_gpu_model
and just use the ordinary model, I'm able to resume exactly where I left off.
Upvotes: 3
Views: 1089
Reputation: 39
@saul19am When you compile it, you can only load the weights and the model structure, but you still lose the optimizer_state. I think this can help.
Upvotes: 0
Reputation: 11
When you call multi_gpu_model(...)
, Keras automatically sets the weights of your model to some default values (at least in the version 2.2.0 which I am currently using). That's why you were not able to resume the training at the same point as it was when you saved it.
I just solved the issue by replacing the weights of the parallel model with the weights from the sequential model:
parallel_model = multi_gpu_model(model, gpus=2)
parallel_model.layers[-2].set_weights(model.get_weights()) # you can check the index of the sequential model with parallel_model.summary()
parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)
I hope this will help you.
Upvotes: 1