aksingh2411
aksingh2411

Reputation: 314

Accuracy is Decreasing Too Slowly with each Epoch in Tensorflow Federated Training

My Tensorflow Federated model is taking too long to converge. When I use the same model without TFF wrapping, training it with tensoflow 2.0, the accuracy reaches 0.97 within few epochs. However, with TFF training the same model is able to reach only 0.03 in 30 epochs. What could be the reason for such low accuracy during TFF training. Is there a way to improve this. My code is given below:

# Building the Federated Averaging Process
iterative_process = tff.learning.build_federated_averaging_process(
  model_fn,
  client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.02),
  server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0))

str(iterative_process.initialize.type_signature)

state = iterative_process.initialize()

state, metrics = iterative_process.next(state, federated_train_data)
print('round  1, metrics={}'.format(metrics))

NUM_ROUNDS = 1000
for round_num in range(2, NUM_ROUNDS):
  state, metrics = iterative_process.next(state, federated_train_data)
  print('round {:2d}, metrics={}'.format(round_num, metrics))

Upvotes: 1

Views: 429

Answers (1)

Zachary Garrett
Zachary Garrett

Reputation: 2941

There possibly is mixing of terminology here: depending on what an epoch means, in federated learning this may be expected.

If epoch is counting "rounds" (the for-loop in the code above): generally a round in federated learning is much smaller than an epoch in centralized learning. The global model is only updated once in a round, and those updates are trained on many fewer examples than the entire dataset. Often if a dataset has M examples divided over K clients, federated learning may select only a few of those clients to participate in a round, seeing only some multiple of M / K examples that round.

Contrast with centralized learning, in which an epoch over the same dataset with M examples and a training procedure using a batch size of N would advance the model M / N steps, and see all M examples.

Generally it takes more rounds in federated learning to train a model than epochs in centralized learning, which can be thought of as caused by rounds being much smaller.

Upvotes: 2

Related Questions