Reputation: 296
I would like to know whether there is a recommended way of measuring execution time in Tensorflow Federated. To be more specific, if one would like to extract the execution time for each client in a certain round, e.g., for each client involved in a FedAvg round, saving the time stamp before the local training starts and the time stamp just before sending back the updates, what is the best (or just correct) strategy to do this? Furthermore, since the clients' code run in parallel, are such a time stamps untruthful (especially considering the hypothesis that different clients may be using differently sized models for local training)?
To be very practical, using tf.timestamp()
at the beginning and at the end of @tf.function
client_update(model, dataset, server_message, client_optimizer)
-- this is probably a simplified signature -- and then subtracting such time stamps is appropriate?
I have the feeling that this is not the right way to do this given that clients run in parallel on the same machine.
Thanks to anyone can help me on that.
Upvotes: 1
Views: 653
Reputation: 2941
There are multiple potential places to measure execution time, first might be defining very specifically what is the intended measurement.
Measuring the training time of each client as proposed is a great way to get a sense of the variability among clients. This could help identify whether rounds frequently have stragglers. Using tf.timestamp()
at the beginning and end of the client_update
function seems reasonable. The question correctly notes that this happens in parallel, summing all of these times would be akin to CPU time.
Measuring the time it takes to complete all client training in a round would generally be the maximum of the values above. This might not be true when simulating FL in TFF, as TFF maybe decided to run some number of clients sequentially due to system resources constraints. In practice all of these clients would run in parallel.
Measuring the time it takes to complete a full round (the maximum time it takes to run a client, plus the time it takes for the server to update) could be done by moving the tf.timestamp
calls to the outer training loop. This would be wrapping the call to trainer.next()
in the snippet on https://www.tensorflow.org/federated. This would be most similar to elapsed real time (wall clock time).
Upvotes: 1