Reputation: 560
I have a Glue job that outputs a .out file into S3. The format of this file is fine for training a TensorFlow model on SageMaker (using script mode), but I am struggling to parse this data when running a batch transform.
I'm using the input_handler and output_handler functions as per the preferred inference.py scripting approach that is recommended, but I'm not exactly sure if I should treat the .out file as application/json, or text/csv, or something else entirely.
Example of the inference.py file: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_batch_transform/tensorflow_cifar-10_with_inference_script/code/inference.py
Upvotes: 0
Views: 684
Reputation: 558
What the input_handler should do depends on the data format of the .out file.
Batch Transform takes the data in that .out file, puts it into the request payload of an HTTP request, and sends that request to the input_handler
. For example, if your .out file is line-separated JSON, your input_handler should read the data from the request just like it would read the same data from a file.
Batch can also split the data by certain characters and send chunks or individual records to the model server, in which case your input_handler would handle those individual chunks or records.
If you know the data format of your .out format, you can leave out the content type from the handler. The content type a string that Batch Transform adds to requests to let the model server switch what it does based on the data format, but the meaning of that string (whether it's "application/json" or "application/foo") doesn't change the behavior of Batch or the model server.
Upvotes: 1