How does MLFlow track the data used in experiments?

Question

I am just starting learning about MLFlow, so apologies if I don't use the correct terminology.

I have done some coding and experiments with MLFlow, in which I named an experiment, and track some metrics, plots and even models.

Later in the MLFlow UI I can see a list of experiments with their tracked elements and artifacts.

My question is how does this work with datasets?

For example if I use a particular data set to train , or to do inference with a model and some metrics are recorded, how can I track that a particular dataset was used to obtain a particular metric?

I am imaging that the entire dataset is not stored, is it? Because that would use a lot of disk?

How does MLFlow track the data used in experiments?

Answers (1)

Related Questions