Machine Learning Development Workflow for Large Datasets

Question

What workflow do you use when you have a large dataset of 300GB and your computer only has 250gb of memory?

Definitely use a dev set locally, but do you put the 300gb on an S3 bucket for production so that it is easy to power down the AWS when you are not using it and so that it is easy to extract the model when the computation is done?

I did a couple of basic measurements and it takes 5 seconds on average to load a file from s3. Does S3 perform significantly better when the files are in bigger chunks?

Machine Learning Development Workflow for Large Datasets

Answers (1)

Related Questions