Tensorflow - Averaging model weights from restored models

Question

Given that I trained several different models on the same data and all the neural networks I trained have the same architecture I would like to know if it's possible to restore those models, average their weights and initialise my weights using the average.

This is an example of how the graph might look. Basically what I need is an average of the weights I am going to load.

import tensorflow as tf
import numpy as np

#init model1 weights
weights = {
    'w1': tf.Variable(),
    'w2': tf.Variable()
}
# init model1 biases
biases = {
    'b1': tf.Variable(),
    'b2': tf.Variable()
}
#init model2 weights
weights2 = {
    'w1': tf.Variable(),
    'w2': tf.Variable()
}
# init model2 biases
biases2 = {
    'b1': tf.Variable(),
    'b2': tf.Variable(),
}

# this the average I want to create
w = {
    'w1': tf.Variable(
        tf.add(weights["w1"], weights2["w1"])/2
    ),
    'w2': tf.Variable(
        tf.add(weights["w2"], weights2["w2"])/2
    ),
    'w3': tf.Variable(
        tf.add(weights["w3"], weights2["w3"])/2
    )
}
# init biases
b = {
    'b1': tf.Variable(
        tf.add(biases["b1"], biases2["b1"])/2
    ),
    'b2': tf.Variable(
        tf.add(biases["b2"], biases2["b2"])/2
    ),
    'b3': tf.Variable(
        tf.add(biases["b3"], biases2["b3"])/2
    )
}

weights_saver = tf.train.Saver({
    'w1' : weights['w1'],
    'w2' : weights['w2'],
    'b1' : biases['b1'],
    'b2' : biases['b2']
    })
weights_saver2 = tf.train.Saver({
    'w1' : weights2['w1'],
    'w2' : weights2['w2'],
    'b1' : biases2['b1'],
    'b2' : biases2['b2']
    })

And this what I am want to get when I run the tf session. c contains the weights I want to use in order to start the training.

# Create a session for running operations in the Graph.
init_op = tf.global_variables_initializer()
init_op2 = tf.local_variables_initializer()

with tf.Session() as sess:
    coord = tf.train.Coordinator()
    # Initialize the variables (like the epoch counter).
    sess.run(init_op)
    sess.run(init_op2)
    weights_saver.restore(
        sess,
        'my_model1/model_weights.ckpt'
    )
    weights_saver2.restore(
        sess,
        'my_model2/model_weights.ckpt'
    )
    a = sess.run(weights)
    b = sess.run(weights2)
    c = sess.run(w)

Sorin · Accepted Answer

First, I assume the model structure is exactly the same (same number of layers, same number of nodes/layer). If not they you will have problems mapping variables (there will be variables in one model but not in the other.

What you want to do is have 3 sessions. First 2 you load from checkpoints, the last one will hold the average. You want this because each session will contain a version of the values of the variables.

After you load a model use tf.trainable_variables() to get a list of all the variables in the model. You can pass it to sess.run to get the variables as numpy arrays. After you compute the averages use tf.assign to create operations to change the variables. You can also use the list to change the initializers, but that means passing in to the model (not always an option).

Roughly:

graph = tf.Graph()
session1 = tf.Session()
session2 = tf.Session()
session3 = tf.Session()

# Omitted code: Restore session1 and session2.
# Optionally initialize session3.

all_vars = tf.trainable_variables()
values1 = session1.run(all_vars)
values2 = session2.run(all_vars)

all_assign = []
for var, val1, val2 in zip(all_vars, values1, values2):
  all_assign.append(tf.assign(var, tf.reduce_mean([val1,val2], axis=0)))

session3.run(all_assign)

# Do whatever you want with session 3.

Tensorflow - Averaging model weights from restored models

Answers (2)

Related Questions