Reputation: 21
I am trying to implement a Differentially private FL binary classification model using gaussian adaptive clipping geometric method.
aggregation_factory = tff.aggregators.DifferentiallyPrivateFactory.gaussian_adaptive(
noise_multiplier=0.6,
clients_per_round=10,
initial_l2_norm_clip=0.1,
target_unclipped_quantile=0.8,
learning_rate=0.2)
I know that the initial_l2_norm_clip is the initial value of clipping norm which is updated based on the target_unclipped_quantile value.
How can we determine the appropriate value of initial_l2_norm_clip for a particular model?
when I set it (initial_l2_norm_clip) to 0.1 I am getting a really low AOC (around 0.4) but when I set it to a higher value of 1.0 I am getting a better AOC value (around 0.8) and in both cases the 'clip' metric which is recorded by the iterative process always increases (i.e it goes from 0.1 to 0.3 and 1.0 to 1.2)
my model is running for 13 rounds with 10 clients per round does this make a difference?
Upvotes: 2
Views: 83
Reputation: 900
One thing I would flag is that 13 training rounds is relatively few in general. If you run the training for longer, I would expect the clip norm will eventually stabilize around the same value, regardless of the initial value.
The point of the adaptive selection of the clipping norm is that the hyper parameter configuration of the initial norm should not matter that much. If you see the clipping norm reported in metrics increase during training, it means the initial_l2_norm_clip
is small, relative to the target_unclipped_quantile
of the values actually seen at runtime. So, you can increase the initial norm and it should match the target quantile faster. If you want to spend the time tuning this parameter, you can also use the gaussian_fixed constructor, and have the clipping norm constant throughout training.
However, note that if you are interested in differential privacy, a larger clipping norm will likely degrade the guarantee you can get. So there is a tradeoff to be explored, together with the total number of rounds to train a model.
Upvotes: 2