Reputation: 584

TFF : Difference between split clients into train and test or split each client dataset into train and test

In this paper, the authors choose 2500 training clients and 900 clients for evaluation

but in this tutorial, they split the dataset of each client into training and test. So, I would like to know which is better ? and what is the importance of spliting clients into training and evaluation ? Thanks!!

Upvotes: 2

Answers (1)

Zachary Garrett

Reputation: 2941

"Which evaluation method is better" depends on the question the investigator is trying to answer. Both hold-out users (the paper), and hold-out examples (the tutorial) can provide different information about the model and training process.

The paper focuses on how global model accuracy relates to personalized model accuracy. The personalized accuracy is measuring how well each client's local model performs on the clients local data. In this scenario, each client ends up having a different model.

The tutorial is only investigating the first part, the global model accuracy, without continuing to personalize the models locally for each client.

In practical settings its not always possible to personalize a model (some clients may have no data), and some personalization strategies may make the model worse for the given client. However, as the paper demonstrates, a working personalization strategy can have a significant improvement on local client accuracy.

Upvotes: 2

TFF : Difference between split clients into train and test or split each client dataset into train and test

Answers (1)

Related Questions