Liam Ferris
Liam Ferris

Reputation: 1846

How to specify a name for the output file of a SageMaker Batch Transform job?

I have a Batch Transform job set up in AWS SageMaker. Currently this uses some input data and a pre-trained model. The orchestration of the job is being done using the boto3 python library from within a lambda.

Something I am having difficulty with is a good way to specify the name of the output file, in our case a predictions.csv. Ideally we would like to add a timestamp to this name.

First thing I tried was to apply a filename via a parameter to the pandas.to_csv() function. However making only this change SageMaker then fails with the following error:

TypeError: The view function did not return a valid response. The function either returned None or ended without a return statement.

This is a pretty weird error, especially given the code change that causes it.

I have also tried applying a filename to the output_path parameter which is part of the SageMaker transformer object. This is intended to only specify the S3 folder path and adding a filename at the end just causes a weirdly named s3 folder (e.g. output/stillafolder.csv/predictions.csv).

The only way in which I have found that allows me to change the output filename, is to change the input filename, as a behaviour I have observed (although I have not found any documentation on this) is that the output filename will by default match the input filename.

This isn't great for my current purposes though so any advice would be much appreciated!

Upvotes: 3

Views: 3466

Answers (1)

bobbruno
bobbruno

Reputation: 94

According to the SageMaker Developer Documentation:

For every S3 object used as input for the transform job, batch transform stores the transformed data with an .out suffix in a corresponding subfolder in the location in the output prefix.

, you can't tell SageMaker to generate a specific file name. It will take the input file(s) and append a .out to each of them. The output_path in the python SDK maps to the S3OutputPath data field I linked to above, and its purpose is to specify a different bucket and folder structure, not the file name itself.

If you need a specific file name, you should add an S3 call after the SageMaker invocation to move the result file to the name/location you want. If you input several files and you want a single output, you need to add code to concatenate the outputs.

Upvotes: 2

Related Questions