Place loaded frozen model on specific gpu device in Tensorflow

Question

I have a frozen model and 4 gpus. I would like to perform inference on as much data as fast as possible. I basically want to execute data parallelism where the same model is performing inference on 4 batches: one batch for each gpu.

This is what I am roughly trying to do

def return_ops():
    # load the graph
    with tf.Graph().as_default() as graph:
        od_graph_def = tf.GraphDef()
        with tf.gfile.GFile(model_path, 'rb') as fid:
            serialized_graph = fid.read()
            od_graph_def.ParseFromString(serialized_graph)
            tf.import_graph_def(od_graph_def, name='')

    inputs = []
    outputs = []
    with graph.as_default() as g:
        for gpu in ['/gpu:0', '/gpu:1', '/gpu:2', '/gpu:3']:
            with tf.device(gpu):
                image_tensor = g.get_tensor_by_name('input:0')
                get_embeddings = g.get_tensor_by_name('embeddings:0')
            inputs.append(image_tensor)
            outputs.append(get_embeddings)

    return inputs, outputs, g

However, when I run

#sample batch
x = np.ones((100,160,160,3))
# get ops
image_tensor_list, pt_list, emb_list, graph = return_ops()
# construct feed dict
feed_dict = {it: x for it in image_tensor_list}

# run the ops
with tf.Session(graph=graph, config=tf.ConfigProto(allow_soft_placement=True)) as sess:
    inf = sess.run(emb_list, feed_dict=feed_dict)

Everything is running on /gpu:0 when inspecting using nvidia-smi.

I can, however, run

with tf.device("/gpu:1"):
    t = tf.range(1000)

with tf.Session() as sess:
    sess.run(t)

and there is activity on the second gpu...

How can I implement this data parallelism task properly?

Place loaded frozen model on specific gpu device in Tensorflow

Answers (1)

Related Questions