Reputation: 1070
In Distributed Tensorflow, we could run multiple clients working with workers in Parameter-Server architecture, which is known as "Between-Graph Replication". According to the documentation,
Between-graph replication. In this approach, there is a separate client for each /job:worker task, typically in the same process as the worker task.
it says the client and worker typically are in the same process. However, if they are not in the same process, can number of clients are not equal to the number of workers? Also, can multiple clients share and run on the same CPU core?
Upvotes: 0
Views: 224
Reputation: 738
Clients are the python programs that define a graph and initialize a session in order to run computation. If you start these programs, the created processes represent the servers in the distributed architecture.
Now it is possible to write programs that do not create a graph and do not run session, but rather just call the server.join() method with the appropriate job name and task index. This way you could theoretically have a single client defining the whole graph and start a session with its corresponding server.target; then within this session, parts of the graph are automatically going to be sent to the other processes/servers and they will do the computations (as long as you have set which server/task is going to do what). This setup describes the in-graph replication architecture.
So, it is basically possible to start several servers/processes on the same machine, that has only a single CPU, but you are not going to gain much parallelism, because context switching between multiple running processes is going to slow you down. So unless the servers are doing some unrelated work, you should rather avoid this kind of setup.
Between-graph just means that every worker is going to have its own client and run its own session respectively.
Upvotes: 1