Reputation: 13
I am trying to average tensor of two model with identical structure but trained with different datasets. The model are stored in ckpt file.
I tried to look at avg_checkpoints function from tensor2tensor but have no idea how to use it.
How do I solve the problem?
from tensor2tensor.utils import avg_checkpoints
print(avg_checkpoints.checkpoint_exists("/"))
#I got true from console
#I have copied final ckpt from different model to the root file
avg_checkpoint.main(?)
#no idea what to replace the ? with
Upvotes: 1
Views: 174
Reputation: 2670
avg_checkpoints.py is an executable script, so you can use it from the command line, e.g.:
python utils/avg_checkpoints.py
--checkpoints path/to/checkpoint1,path/to/checkpoint2
--num_last_checkpoints 2
--output_path where/to/save/the/output
Note that if the two checkpoints were trained on different datasets from scratch, the averaging would not work. If you had a single pre-trained model which you just fine-tuned on two different datasets, then the averaging could work.
You can average more than two checkpoints. A hacky, but simple way to add weights for each checkpoint is to include it multiple times in --checkpoints
(and increase num_last_checkpoints
accordingly).
Upvotes: 2