NANJIA WANG
NANJIA WANG

Reputation: 13

How to get tensor from multiple models and average them?

I am trying to average tensor of two model with identical structure but trained with different datasets. The model are stored in ckpt file.

I tried to look at avg_checkpoints function from tensor2tensor but have no idea how to use it.

How do I solve the problem?

from tensor2tensor.utils import avg_checkpoints

print(avg_checkpoints.checkpoint_exists("/"))
#I got true from console
#I have copied final ckpt from different model to the root file

avg_checkpoint.main(?)
#no idea what to replace the ? with

Upvotes: 1

Views: 174

Answers (1)

Martin Popel
Martin Popel

Reputation: 2670

avg_checkpoints.py is an executable script, so you can use it from the command line, e.g.:

python utils/avg_checkpoints.py
  --checkpoints path/to/checkpoint1,path/to/checkpoint2
  --num_last_checkpoints 2
  --output_path where/to/save/the/output

Note that if the two checkpoints were trained on different datasets from scratch, the averaging would not work. If you had a single pre-trained model which you just fine-tuned on two different datasets, then the averaging could work.

You can average more than two checkpoints. A hacky, but simple way to add weights for each checkpoint is to include it multiple times in --checkpoints (and increase num_last_checkpoints accordingly).

Upvotes: 2

Related Questions