Marlen
Marlen

Reputation: 1

Problem in Evaluation of TensorFlow Federated Model

I am working on eavesdropper detection on B5G system with Deep Learning. The training dataset contains 1200 CSI images of legitimate and malicious UEs.

I have created a DCNN model (with Keras Sequential()) that takes as input the images and used it with Federated Learning. For this reason, I devide the data to BSs, based on the nearest one to each UE.

As you can see below, the training process works really good (if I train for more rounds I can reach 97% accuracy, same as with the according CL model).

However, the validation is frozen!Even if I change val_datasets to train_datasets in evaluation_obj, I still get the same problem, but for the training data the results are different.

I attach the relative code. I have tried anything as far as the dataset is concerned (augmentation, balancing, extra dropout layers on models but nothing).

def model_fn():
    keras_model = create_dcnn_model()
    return tff.learning.models.from_keras_model(
        keras_model,
        input_spec=train_datasets[0].element_spec,
        loss=tf.keras.losses.BinaryCrossentropy(),
        metrics=[tf.keras.metrics.BinaryAccuracy()]
    )

iterative_process = []
iterative_process = tff.learning.algorithms.build_weighted_fed_avg(
    model_fn,
    client_optimizer_fn=tff.learning.optimizers.build_adam(learning_rate=0.0001, beta_1=0.75),
    server_optimizer_fn=tff.learning.optimizers.build_sgdm(learning_rate=0.1, momentum=0.9),
)
state = iterative_process.initialize()

evaluation_obj = []
evaluation_obj = tff.learning.algorithms.build_fed_eval(
    model_fn,
)
evaluation_state = evaluation_obj.initialize()
NUM_ROUNDS = 5
local_epochs = 3
metrics_per_round = []

for round_num in range(1, NUM_ROUNDS+1):

  #Debugging
  print(f"Round {round_num} - Before training:")
  weights_before = iterative_process.get_model_weights(state).trainable
  print([tf.reduce_sum(w).numpy() for w in weights_before])

  for epoch in range(local_epochs):

      result = iterative_process.next(state, train_datasets)
      state = result.state
      train_metrics = result.metrics
      metrics_per_round.append(train_metrics)
      print('TRAINING round {:2d}, epoch {:2d}, metrics={}'.format(round_num, epoch, train_metrics))

  #Debugging
  print(f"Round {round_num} - After training:")
  weights_after = iterative_process.get_model_weights(state).trainable
  print([tf.reduce_sum(w).numpy() for w in weights_after])

  model_weights = iterative_process.get_model_weights(state)
  for i, dataset in enumerate(val_datasets):
    evaluation_state = evaluation_obj.set_model_weights(evaluation_state, model_weights)
    validation = evaluation_obj.next(evaluation_state, [dataset])  # Validation σε κάθε πελάτη ξεχωριστά
    print(f'VALIDATION Client {i}, round {round_num}, metrics={validation.metrics}')

Output:

Round 1 - Before training:
[-5.1930037, 0.0, 64.0, 0.0, -6.5921564, 0.0, 128.0, 0.0, 1.7643318, 0.0, 256.0, 0.0, -1.109535, 0.0]
TRAINING round  1, epoch  0, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('binary_accuracy', 0.67), ('loss', 0.6297659), ('num_examples', 1200), ('num_batches', 76)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
TRAINING round  1, epoch  1, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('binary_accuracy', 0.6716667), ('loss', 0.62734866), ('num_examples', 1200), ('num_batches', 76)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
TRAINING round  1, epoch  2, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('binary_accuracy', 0.68083334), ('loss', 0.6126825), ('num_examples', 1200), ('num_batches', 76)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
Round 1 - After training:
[-5.198599, 0.00015918468, 64.00052, -0.0012102914, -4.846347, 3.4780123e-07, 127.999535, 0.0019988809, 3.007872, -1.1763255e-05, 256.00336, 0.0037695481, -1.2383966, -0.000516126]
VALIDATION Client 0, round 1, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('eval', OrderedDict([('current_round_metrics', OrderedDict([('binary_accuracy', 0.6262626), ('loss', 0.6813371), ('num_examples', 99), ('num_batches', 7)])), ('total_rounds_metrics', OrderedDict([('binary_accuracy', 0.6262626), ('loss', 0.6813371), ('num_examples', 99), ('num_batches', 7)]))]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', ())])
VALIDATION Client 1, round 1, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('eval', OrderedDict([('current_round_metrics', OrderedDict([('binary_accuracy', 0.6086956), ('loss', 0.6821681), ('num_examples', 115), ('num_batches', 8)])), ('total_rounds_metrics', OrderedDict([('binary_accuracy', 0.6086956), ('loss', 0.6821681), ('num_examples', 115), ('num_batches', 8)]))]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', ())])
VALIDATION Client 2, round 1, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('eval', OrderedDict([('current_round_metrics', OrderedDict([('binary_accuracy', 0.5), ('loss', 0.69437194), ('num_examples', 86), ('num_batches', 6)])), ('total_rounds_metrics', OrderedDict([('binary_accuracy', 0.5), ('loss', 0.69437194), ('num_examples', 86), ('num_batches', 6)]))]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', ())])
Round 2 - Before training:
[-5.198599, 0.00015918468, 64.00052, -0.0012102914, -4.846347, 3.4780123e-07, 127.999535, 0.0019988809, 3.007872, -1.1763255e-05, 256.00336, 0.0037695481, -1.2383966, -0.000516126]
TRAINING round  2, epoch  0, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('binary_accuracy', 0.68833333), ('loss', 0.6058925), ('num_examples', 1200), ('num_batches', 76)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
TRAINING round  2, epoch  1, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('binary_accuracy', 0.6933333), ('loss', 0.6011166), ('num_examples', 1200), ('num_batches', 76)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
TRAINING round  2, epoch  2, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('binary_accuracy', 0.7058333), ('loss', 0.58903444), ('num_examples', 1200), ('num_batches', 76)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
Round 2 - After training:
[-5.037955, 0.00021043388, 64.00128, -0.008516019, -3.0151868, -5.603911e-07, 127.9972, 0.0029792283, 6.190685, -3.7346028e-05, 256.01352, 0.013353077, -1.4950545, -0.001535288]
VALIDATION Client 0, round 2, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('eval', OrderedDict([('current_round_metrics', OrderedDict([('binary_accuracy', 0.6262626), ('loss', 0.6703023), ('num_examples', 99), ('num_batches', 7)])), ('total_rounds_metrics', OrderedDict([('binary_accuracy', 0.6262626), ('loss', 0.6703023), ('num_examples', 99), ('num_batches', 7)]))]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', ())])
VALIDATION Client 1, round 2, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('eval', OrderedDict([('current_round_metrics', OrderedDict([('binary_accuracy', 0.6086956), ('loss', 0.67234236), ('num_examples', 115), ('num_batches', 8)])), ('total_rounds_metrics', OrderedDict([('binary_accuracy', 0.6086956), ('loss', 0.67234236), ('num_examples', 115), ('num_batches', 8)]))]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', ())])
VALIDATION Client 2, round 2, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('eval', OrderedDict([('current_round_metrics', OrderedDict([('binary_accuracy', 0.5), ('loss', 0.6992505), ('num_examples', 86), ('num_batches', 6)])), ('total_rounds_metrics', OrderedDict([('binary_accuracy', 0.5), ('loss', 0.6992505), ('num_examples', 86), ('num_batches', 6)]))]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', ())])
Round 3 - Before training:
[-5.037955, 0.00021043388, 64.00128, -0.008516019, -3.0151868, -5.603911e-07, 127.9972, 0.0029792283, 6.190685, -3.7346028e-05, 256.01352, 0.013353077, -1.4950545, -0.001535288]
TRAINING round  3, epoch  0, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('binary_accuracy', 0.7083333), ('loss', 0.5766273), ('num_examples', 1200), ('num_batches', 76)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
TRAINING round  3, epoch  1, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('binary_accuracy', 0.6925), ('loss', 0.577319), ('num_examples', 1200), ('num_batches', 76)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
TRAINING round  3, epoch  2, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('binary_accuracy', 0.69916666), ('loss', 0.5600956), ('num_examples', 1200), ('num_batches', 76)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
Round 3 - After training:
[-4.7960944, 0.00038768398, 64.00191, -0.023654565, -3.3896022, 6.6397965e-06, 127.99365, 0.0012963403, 11.357792, -6.6096305e-05, 256.03156, 0.026010748, -1.7677606, -0.0025973155]
VALIDATION Client 0, round 3, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('eval', OrderedDict([('current_round_metrics', OrderedDict([('binary_accuracy', 0.6262626), ('loss', 0.66253), ('num_examples', 99), ('num_batches', 7)])), ('total_rounds_metrics', OrderedDict([('binary_accuracy', 0.6262626), ('loss', 0.66253), ('num_examples', 99), ('num_batches', 7)]))]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', ())])
VALIDATION Client 1, round 3, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('eval', OrderedDict([('current_round_metrics', OrderedDict([('binary_accuracy', 0.6086956), ('loss', 0.665855), ('num_examples', 115), ('num_batches', 8)])), ('total_rounds_metrics', OrderedDict([('binary_accuracy', 0.6086956), ('loss', 0.665855), ('num_examples', 115), ('num_batches', 8)]))]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', ())])
VALIDATION Client 2, round 3, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('eval', OrderedDict([('current_round_metrics', OrderedDict([('binary_accuracy', 0.5), ('loss', 0.71379626), ('num_examples', 86), ('num_batches', 6)])), ('total_rounds_metrics', OrderedDict([('binary_accuracy', 0.5), ('loss', 0.71379626), ('num_examples', 86), ('num_batches', 6)]))]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', ())])
Round 4 - Before training:
[-4.7960944, 0.00038768398, 64.00191, -0.023654565, -3.3896022, 6.6397965e-06, 127.99365, 0.0012963403, 11.357792, -6.6096305e-05, 256.03156, 0.026010748, -1.7677606, -0.0025973155]
TRAINING round  4, epoch  0, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('binary_accuracy', 0.7158333), ('loss', 0.5495191), ('num_examples', 1200), ('num_batches', 76)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
TRAINING round  4, epoch  1, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('binary_accuracy', 0.71416664), ('loss', 0.54308075), ('num_examples', 1200), ('num_batches', 76)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
TRAINING round  4, epoch  2, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('binary_accuracy', 0.7241667), ('loss', 0.5290633), ('num_examples', 1200), ('num_batches', 76)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
Round 4 - After training:
[-4.523261, 0.00081014796, 64.00284, -0.04872443, -7.120768, 1.5778049e-05, 127.98874, -0.00421518, 18.820229, -0.00010246113, 256.05884, 0.041554034, -1.9932203, -0.0034644678]
VALIDATION Client 0, round 4, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('eval', OrderedDict([('current_round_metrics', OrderedDict([('binary_accuracy', 0.6262626), ('loss', 0.6637882), ('num_examples', 99), ('num_batches', 7)])), ('total_rounds_metrics', OrderedDict([('binary_accuracy', 0.6262626), ('loss', 0.6637882), ('num_examples', 99), ('num_batches', 7)]))]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', ())])
VALIDATION Client 1, round 4, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('eval', OrderedDict([('current_round_metrics', OrderedDict([('binary_accuracy', 0.6086956), ('loss', 0.6679278), ('num_examples', 115), ('num_batches', 8)])), ('total_rounds_metrics', OrderedDict([('binary_accuracy', 0.6086956), ('loss', 0.6679278), ('num_examples', 115), ('num_batches', 8)]))]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', ())])
VALIDATION Client 2, round 4, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('eval', OrderedDict([('current_round_metrics', OrderedDict([('binary_accuracy', 0.5), ('loss', 0.7374798), ('num_examples', 86), ('num_batches', 6)])), ('total_rounds_metrics', OrderedDict([('binary_accuracy', 0.5), ('loss', 0.7374798), ('num_examples', 86), ('num_batches', 6)]))]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', ())])
Round 5 - Before training:
[-4.523261, 0.00081014796, 64.00284, -0.04872443, -7.120768, 1.5778049e-05, 127.98874, -0.00421518, 18.820229, -0.00010246113, 256.05884, 0.041554034, -1.9932203, -0.0034644678]
TRAINING round  5, epoch  0, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('binary_accuracy', 0.7325), ('loss', 0.5184692), ('num_examples', 1200), ('num_batches', 76)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
TRAINING round  5, epoch  1, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('binary_accuracy', 0.74583334), ('loss', 0.50279176), ('num_examples', 1200), ('num_batches', 76)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
TRAINING round  5, epoch  2, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('binary_accuracy', 0.7375), ('loss', 0.5011976), ('num_examples', 1200), ('num_batches', 76)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
Round 5 - After training:
[-4.2883253, 0.0017090045, 64.003105, -0.08713496, -14.093327, 2.268095e-05, 127.98152, -0.01382696, 28.69728, -0.0001238601, 256.10028, 0.06240599, -2.1537747, -0.00408097]
VALIDATION Client 0, round 5, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('eval', OrderedDict([('current_round_metrics', OrderedDict([('binary_accuracy', 0.6262626), ('loss', 0.6729642), ('num_examples', 99), ('num_batches', 7)])), ('total_rounds_metrics', OrderedDict([('binary_accuracy', 0.6262626), ('loss', 0.6729642), ('num_examples', 99), ('num_batches', 7)]))]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', ())])
VALIDATION Client 1, round 5, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('eval', OrderedDict([('current_round_metrics', OrderedDict([('binary_accuracy', 0.6086956), ('loss', 0.6771792), ('num_examples', 115), ('num_batches', 8)])), ('total_rounds_metrics', OrderedDict([('binary_accuracy', 0.6086956), ('loss', 0.6771792), ('num_examples', 115), ('num_batches', 8)]))]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', ())])
VALIDATION Client 2, round 5, metrics=OrderedDict([('distributor', ()), ('client_work', OrderedDict([('eval', OrderedDict([('current_round_metrics', OrderedDict([('binary_accuracy', 0.5), ('loss', 0.76686454), ('num_examples', 86), ('num_batches', 6)])), ('total_rounds_metrics', OrderedDict([('binary_accuracy', 0.5), ('loss', 0.76686454), ('num_examples', 86), ('num_batches', 6)]))]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', ())])

Any ideas? Am I missing something in tff evaluation?

Upvotes: 0

Views: 43

Answers (1)

Marlen
Marlen

Reputation: 1

I tried fine tuning after fl training, both in server and in each client, and both ways give better accuracy in evaluation. The Fine-Tuning goes in the traditional way like this:

final_model_weights = iterative_process.get_model_weights(state)
best_model = create_dcnn_model()
for keras_w, fl_w in zip(best_model.trainable_weights, final_model_weights.trainable):
    keras_w.assign(fl_w)
best_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
best_model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=3, batch_size=8)
predictions = best_model.predict(X_val)
labels = (predictions > 0.5).astype("int32")
print(classification_report(y_val, labels, target_names=['Not Eavesdropper', 'Eavesdropper']))

This makes me think if there i a problem or an error it the way I use tensorflow federated process.

Upvotes: 0

Related Questions