How to match input/output with sagemaker batch transform?

Question

I'm using sagemaker batch transform, with json input files. see below for sample input/output files. i have custom inference code below, and i'm using json.dumps to return prediction, but it's not returning json. I tried to use => "DataProcessing": {"JoinSource": "string", }, to match input and output. but i'm getting error that "unable to marshall ..." . I think because , the output_fn is returning array of list or just list and not json , that is why it is unable to match input with output.any suggestions on how should i return the data?

infernce code

def model_fn(model_dir):
...
def input_fn(data, content_type):
...
def predict_fn(data, model):
...
def output_fn(prediction, accept):
    if accept == "application/json":
        return json.dumps(prediction), mimetype=accept)
    raise RuntimeException("{} accept type is not supported by this script.".format(accept))

input file

{"data" : "input line  one" }
{"data" : "input line  two" }
....

output file

["output line  one" ]
["output line  two" ]

{
   "BatchStrategy": SingleRecord,
   "DataProcessing": { 
      "JoinSource": "string",
   },
   "MaxConcurrentTransforms": 3,
   "MaxPayloadInMB": 6,
   "ModelClientConfig": { 
      "InvocationsMaxRetries": 1,
      "InvocationsTimeoutInSeconds": 3600
   },
   "ModelName": "some-model",
   "TransformInput": { 
      "ContentType": "string",
      "DataSource": { 
         "S3DataSource": { 
            "S3DataType": "string",
            "S3Uri": "s3://bucket-sample"
         }
      },
      "SplitType": "Line"
   },
   "TransformJobName": "transform-job"
}

Marc Karp · Accepted Answer

json.dumps will not convert your array to a dict structure and serialize it to a JSON String.

What data type is prediction ? Have you tested making sure prediction is a dict?

You can confirm the data type by adding print(type(prediction)) to see the data type in the CloudWatch Logs.

If prediction is a list you can test the following:

def output_fn(prediction, accept):
    if accept == "application/json":

        my_dict = {'output': prediction}
        return json.dumps(my_dict), mimetype=accept)

    raise RuntimeException("{} accept type is not supported by this script.".format(accept))

DataProcessing and JoinSource are used to associate the data that is relevant to the prediction results in the output. It is not meant to be used to match the input and output format.

How to match input/output with sagemaker batch transform?

Answers (1)

Related Questions