Reputation: 31
I have created a K-means training job with a csv file that I have stored in S3. After a while I receive the following error:
Training failed with the following error: ClientError: Rows 1-5000 in file /opt/ml/input/data/train/features have more fields than than expected size 3.
What could be the issue with my file?
Here are the parameters I am passing to sagemaker.create_training_job
TrainingJobName=job_name,
HyperParameters={
'k': '2',
'feature_dim': '2'
},
AlgorithmSpecification={
'TrainingImage': image,
'TrainingInputMode': 'File'
},
RoleArn='arn:aws:iam::<my_acc_number>:role/MyRole',
OutputDataConfig={
"S3OutputPath": output_location
},
ResourceConfig={
'InstanceType': 'ml.m4.xlarge',
'InstanceCount': 1,
'VolumeSizeInGB': 20,
},
InputDataConfig=[
{
'ChannelName': 'train',
'ContentType': 'text/csv',
"CompressionType": "None",
"RecordWrapperType": "None",
'DataSource': {
'S3DataSource': {
'S3DataType': 'S3Prefix',
'S3Uri': data_location,
'S3DataDistributionType': 'FullyReplicated'
}
}
}
],
StoppingCondition={
'MaxRuntimeInSeconds': 600
}
Upvotes: 2
Views: 1439
Reputation: 1
Make sure your .csv doesn't have column headers, and that the label is the first column. Also make sure your values for the hyper-parameters are accurate ie feature_dim means number of features in your set. If you give it the wrong value, it'll break.
Heres a list of sagemaker knn hyper-parameters and their meanings: https://docs.aws.amazon.com/sagemaker/latest/dg/kNN_hyperparameters.html
Upvotes: 0
Reputation: 23
I've seen this issue appear when doing unsupervised learning, such as the above example using clustering. If you have a csv input, you can also address this issue by setting label_size=0
in the ContentType parameter of the Sagemaker API call, within the InputDataConfig branch.
Here's an example of what the relevant section of the call might look like:
"InputDataConfig": [
{
"ChannelName": "train",
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "some/path/in/s3",
"S3DataDistributionType": "ShardedByS3Key"
}
},
"CompressionType": "None",
"RecordWrapperType": "None",
"ContentType": "text/csv;label_size=0"
}
]
Upvotes: 2