Amir
Amir

Reputation: 2415

Ensemble two tensorflow models

I'm trying to create a single model out of two almost identical models, trained under different conditions and average their outputs inside tensorflow. We want the final model to have the same interface for inference.

We have saved a checkpoint of the two models, and here is how we are trying to solve the problem:

merged_graph = tf.Graph()
with merged_graph.as_default():
    saver1 = tf.train.import_meta_graph('path_to_checkpoint1_model1.meta', import_scope='g1')
    saver2 = tf.train.import_meta_graph('path_to_checkpoint1_model2.meta', import_scope='g2')

with tf.Session(graph=merged_graph) as sess:
  saver1.restore(sess, 'path_to_checkpoint1_model1')
  saver1.restore(sess, 'path_to_checkpoint1_model2')    

  sess.run(tf.global_variables_initializer())

  # export as a saved_model
  builder = tf.saved_model.builder.SavedModelBuilder(kPathToExportDir)
  builder.add_meta_graph_and_variables(sess,
                                       [tf.saved_model.tag_constants.SERVING],
                                       strip_default_attrs=True)    
  builder.save()

There are at least 3 flaws with the above approach, and we have tried many routes but can't get this to work:

  1. The graphs for model1 and model2, have their own main ops. As a result, the model fails during loading with the following error: Failed precondition:

_

Expected exactly one main op in : model
Expected exactly one SavedModel main op. Found: [u'g1/group_deps', u'g2/group_deps']
  1. The two models have their own Placeholder nodes for input (i.e. g1/Placeholder and g2/Placeholder after merging). We couldn't find a way to remove the Placeholder nodes to create a new one that feeds input to both models (we don't want a new interface where we need to feed data into two different placeholders).

  2. The two graphs have their own init_all, restore_all nodes. We couldn't figure out how to combine these NoOp operations into single nodes. This is the same as problem #1.

We couldn't as well find a sample implementation of such mode ensembling inside tensorflow. A sample code might answer all the above questions.

Note: My two models were trained using tf.estimator.Estimator and exported as saved_models. As a result, they contain the main_op.

Upvotes: 1

Views: 3283

Answers (2)

Amir
Amir

Reputation: 2415

I did not solve, but found a workaround for the above problem.

The main problem is that main_op node is added whenever a model is exported with the saved_model API. Since both my models were exported with this API, both had the main_op node, which would be imported into the new graph. Then, the new graph would contain two main_ops which will later fail to load as exactly one main op is expected.

The workaround I chose to use was not to export my final model with the saved_model API, but export with the old handy freeze_graph into a single .pb file.

Here is my working code snippet:

# set some constants:
#   INPUT_SHAPE, OUTPUT_NODE_NAME, OUTPUT_FILE_NAME, 
#   TEMP_DIR, TEMP_NAME, SCOPE_PREPEND_NAME, EXPORT_DIR

# Set path for trained models which are exported with the saved_model API
input_model_paths = [PATH_TO_MODEL1, 
                     PATH_TO_MODEL2, 
                     PATH_TO_MODEL3, ...]
num_model = len(input_model_paths)

def load_model(sess, path, scope, input_node):
    tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], 
                               path,
                               import_scope=scope, 
                               input_map={"Placeholder": input_node})  
    output_tensor = tf.get_default_graph().get_tensor_by_name(
        scope + "/" + OUTPUT_NODE_NAME + ":0")
    return output_tensor  

with tf.Session(graph=tf.Graph()) as sess:
  new_input = tf.placeholder(dtype=tf.float32, 
                             shape=INPUT_SHAPE, name="Placeholder")      

  output_tensors = []
  for k, path in enumerate(input_model_paths):
    output_tensors.append(load_model(sess, 
                                     path, 
                                     SCOPE_PREPEND_NAME+str(k), 
                                     new_input))
  # Mix together the outputs (e.g. sum, weighted sum, etc.)
  sum_outputs = output_tensors[0] + output_tensors[1]
  for i in range(2, num_model):
    sum_outputs = sum_outputs + output_tensors[i]
  final_output = tf.divide(sum_outputs, float(num_model), name=OUTPUT_NODE_NAME)

  # Save checkpoint to be loaded later by the freeze_graph!
  saver_checkpoint = tf.train.Saver()
  saver_checkpoint.save(sess, os.path.join(TEMP_DIR, TEMP_NAME))

  tf.train.write_graph(sess.graph_def, TEMP_DIR, TEMP_NAME + ".pbtxt")
  freeze_graph.freeze_graph(
      os.path.join(TEMP_DIR, TEMP_NAME + ".pbtxt"), 
      "", 
      False, 
      os.path.join(TEMP_DIR, TEMP_NAME),  
      OUTPUT_NODE_NAME, 
      "", # deprecated
      "", # deprecated
      os.path.join(EXPORT_DIR, OUTPUT_FILE_NAME),
      False,
      "")

Upvotes: 0

Jie.Zhou
Jie.Zhou

Reputation: 1318

for question 1, saved_model is not a must

for question 2, input_map arg in tf.train.import_meta_graph can be used

for question 3, you real do not need restore all or initialize all ops any more

this code snapshot can show you how you can combine two graphs and average their outputs in tensorflow:

import tensorflow as tf
merged_graph = tf.Graph()
with merged_graph.as_default():
    input = tf.placeholder(dtype=tf.float32, shape=WhatEverYourShape)
    saver1 = tf.train.import_meta_graph('path_to_checkpoint1_model1.meta', import_scope='g1',
                                        input_map={"YOUR/INPUT/NAME": input})
    saver2 = tf.train.import_meta_graph('path_to_checkpoint1_model2.meta', import_scope='g2',
                                        input_map={"YOUR/INPUT/NAME": input})

    output1 = merged_graph.get_tensor_by_name("g1/YOUR/OUTPUT/TENSOR/NAME")
    output2 = merged_graph.get_tensor_by_name("g2/YOUR/OUTPUT/TENSOR/NAME")
    final_output = (output1 + output2) / 2

with tf.Session(graph=merged_graph) as sess:
    saver1.restore(sess, 'path_to_checkpoint1_model1')
    saver1.restore(sess, 'path_to_checkpoint1_model2')
    # this line should NOT run because it will initialize all variables, your restore op will have no effect
    # sess.run(tf.global_variables_initializer())
    fianl_output_numpy = sess.run(final_output, feed_dict={input: YOUR_NUMPY_INPUT})

Upvotes: 0

Related Questions