garysieling
garysieling

Reputation: 356

Correct parameters for training AWS Sagemaker with multiple classes per image

I've found consistently that "multi_label" to "1" for image classification jobs, they crash with the following error:

Algorithm Error: Internal Server Error
[15:56:08] /opt/brazil-pkg-cache/packages/MXNetECL/MXNetECL-master.657.0/AL2012/generic-flavor/src/src/operator/custom/custom.cc:418: Check failed: reinterpret_cast<CustomOpFBFunc>(params.info->callbacks[kCustomOpBackward])( ptrs.size(), const_cast<void**>(ptrs.data()), const_cast<int*>(tags.data()), reinterpret_cast<const int*>(req.data()), static_cast<int>(ctx.is_train), params.info->contexts[kC
15:56:08 Stack trace returned 7 entries:
15:56:08 [bt] (0) /opt/amazon/lib/libaialgsdataiter.so(dmlc::StackTrace()+0x3d) [0x7f85e19f179d]
15:56:08 [bt] (1) /opt/amazon/lib/libaialgsdataiter.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1a) [0x7f85e19f1a3a] 
15:56:08 [bt] (2) /opt/amazon/lib/libmxnet.so(+0x26da8fd) [0x7f85d0edb8fd]
15:56:08 [bt] (3) /opt/amazon/lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<mxnet::op::custom::CustomOperator::CustomOperator()::{lambda()#1} ()> >::_M_run()+0x12f) [0x7f85d0ede0ef]
15:56:08 [bt] (4) /opt/amazon/lib/libstdc++.so.6(+0xce440) [0x7f85cc9ea440]
15:56:08 [bt] (5) /lib64/libpthread.so.0(+0x7dc5) [0x7f85e31e1dc5]
15:56:08 [bt] (6) /lib64/libc.so.6(clone+0x6d) [0x7f85e25de6ed]
15:56:08 Algorithm Error: Internal Server Error

Based my understanding of the documentation, this parameter should let you assign multiple tags to each image - is there a trick to get it to work, or to debugging these stack traces? (https://docs.aws.amazon.com/sagemaker/latest/dg/IC-Hyperparameter.html)

Upvotes: 0

Views: 156

Answers (2)

Xiong Zhou
Xiong Zhou

Reputation: 1

Can you please check your recordio file that you use for training? Please follow this example for how to prepare dataset for multi-label training

Upvotes: 0

Julien Simon
Julien Simon

Reputation: 2719

Oh, that's ugly... Can you please share code that would let us reproduce the bug? Full logs would be useful too. Happy to raise a support ticket on your behalf.

Julien (AWS)

Upvotes: 1

Related Questions