Reputation: 35
I am trying to build a training set for Sagemaker using the Linear Learner algorithm. This algorithm supports recordIO wrapped protobuf and csv as format for the training data. As the training data is generated using spark I am having issues to generate a csv file from a dataframe (this seem broken for now), so I am trying to use protobuf.
I managed to create a binary file for the training dataset using Protostuff which is a library that allows to generate protobuf messages from POJO objects. The problem is when triggering the training job I receive that message from SageMaker: ClientError: No training data processed. Either the training channel is empty or the mini-batch size is too high. Verify that training data contains non-empty files and the mini-batch size is less than the number of records per training host.
The training file is certainly not null. I suspect the way I generate the training data to be incorrect as I am able to train models using the libsvm format. Is there a way to generate IOrecord using the Sagemaker java client ?
Upvotes: 1
Views: 251
Reputation: 35
Answering my own question. It was an issue in the algorithm configuration. I reduced mini batch size and it worked fine.
Upvotes: 1