some question about grpc+gdr and grpc+verbs in using distributed tensorflow

Question

when i use distributed tensorflow, grpc+gdr is worse than grpc+verbs, but nv_peer_mem is loaded,and i don't know the difference of grpc+verbs and grpc+gdr? anyone can help me? and some output is as below: root@s36-2288H-V5:~# /etc/init.d/nv_peer_mem status

nv_peer_mem module is loaded.

my start code is as below:

python /root/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py
--server_protocol=grpc+verbs
--model=vgg16 --variable_update=parameter_server
--batch_size=64 --num_batches=50 --num_warmup_batches=10
--local_parameter_device=gpu --num_gpus=1
--job_name=ps --task_index=0
--ps_hosts=172.168.30.25:10011
--worker_hosts=172.168.30.26:50012 &

and when i set --server_protocol = grpc+gdr, the performacnce is worse.

some question about grpc+gdr and grpc+verbs in using distributed tensorflow

Answers (0)

Related Questions