Reputation: 584
Why in the federated learning task, we don't split our dataset to train, test and validation, we make only train and test .
Upvotes: 3
Views: 558
Reputation: 2941
The choice of how to split the datasets is really up to the evaluator and what they are trying to accomplish. The preprocessed datasets in TFF (from tff.simulation.datasets
) are usually only split into two, but they can be rejoined and split again in whatever way is desired.
One thing to consider: there are (at least) two dimensions that may be interesting to split on for federated learning.
Furthermore, both of these could be time based (if there is a notion of time), for example splitting each clients dataset into "previous day" (train) and "next day" (test). Or, as is often the case in practice with cross-device FL, splitting by time of day (users available for training at night maybe different than mid-day), Eichner 2019 performed some experiments using this setup.
Note: the tff.simulation.datasets.stackoverflow.load_data
does have three splits named train
, held_out
and test
. Please read the documentation carefully as it utilizes both types of splitting mentioned above.
Upvotes: 3