Using Turi Create Object Detection with CUDA 8.0 on AWS SageMaker Notebook

Question

As the title says, I am attempting to use Turi Create on an AWS SageMaker Notebook instance with Python 3.6 (conda_amazonei_mxnet_p36 environment). Even though CUDA 10.0 is installed by default, CUDA 8.0 also comes pre-installed and can be selected using the following commands in the notebook:

!sudo rm /usr/local/cuda
!sudo ln -s /usr/local/cuda-8.0 /usr/local/cuda

I have verified this installation using nvcc --version and also:

$ cd /usr/local/cuda/samples/1_Utilities/deviceQuery
$ sudo make
$ ./deviceQuery

Next, in my notebook I install Turi Create and the correct version of mxnet for CUDA 8.0:

!pip install turicreate==5.4
!pip uninstall -y mxnet
!pip install mxnet-cu80==1.1.0

Then, I prepare my images and attempt to create a model:

import turicreate as tc

tc.config.set_num_gpus(-1)
images = tc.image_analysis.load_images('images', ignore_failure=True);
data = images.join(annotations_);
train_data, test_data = data.random_split(0.8)
model = tc.object_detector.create(train_data, max_iterations=50)

Which outputs the following when running tc.object_detector.create

Using 'image' as feature column
Using 'annotaion' as annotations column
Downloading https://docs-assets.developer.apple.com/turicreate/models/darknet.params
Download completed: /var/tmp/model_cache/darknet.params
Setting 'batch_size' to 32
Using GPUs to create model (Tesla K80, Tesla K80, Tesla K80, Tesla K80, Tesla K80, Tesla K80, Tesla K80, Tesla K80)
Using default 16 lambda workers.
To maximize the degree of parallelism, add the following code to the beginning of the program:
"turicreate.config.set_runtime_config('TURI_DEFAULT_NUM_PYLAMBDA_WORKERS', 32)"
Note that increasing the degree of parallelism also increases the memory footprint.
---------------------------------------------------------------------------
MXNetError                                Traceback (most recent call last)
_ctypes/callbacks.c in 'calling callback function'()

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/kvstore.py in updater_handle(key, lhs_handle, rhs_handle, _)
     81         lhs = _ndarray_cls(NDArrayHandle(lhs_handle))
     82         rhs = _ndarray_cls(NDArrayHandle(rhs_handle))
---> 83         updater(key, lhs, rhs)
     84     return updater_handle
     85 

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/optimizer/optimizer.py in __call__(self, index, grad, weight)
   1528                 self.sync_state_context(self.states[index], weight.context)
   1529             self.states_synced[index] = True
-> 1530         self.optimizer.update_multi_precision(index, weight, grad, self.states[index])
   1531 
   1532     def sync_state_context(self, state, context):

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/optimizer/optimizer.py in update_multi_precision(self, index, weight, grad, state)
    553         use_multi_precision = self.multi_precision and weight.dtype == numpy.float16
    554         self._update_impl(index, weight, grad, state,
--> 555                           multi_precision=use_multi_precision)
    556 
    557 @register

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/optimizer/optimizer.py in _update_impl(self, index, weight, grad, state, multi_precision)
    535             if state is not None:
    536                 sgd_mom_update(weight, grad, state, out=weight,
--> 537                                lazy_update=self.lazy_update, lr=lr, wd=wd, **kwargs)
    538             else:
    539                 sgd_update(weight, grad, out=weight, lazy_update=self.lazy_update,

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/ndarray/register.py in sgd_mom_update(weight, grad, mom, lr, momentum, wd, rescale_grad, clip_gradient, out, name, **kwargs)

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py in _imperative_invoke(handle, ndargs, keys, vals, out)
     90         c_str_array(keys),
     91         c_str_array([str(s) for s in vals]),
---> 92         ctypes.byref(out_stypes)))
     93 
     94     if original_output is not None:

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/base.py in check_call(ret)
    144     """
    145     if ret != 0:
--> 146         raise MXNetError(py_str(_LIB.MXGetLastError()))
    147 
    148 

MXNetError: Cannot find argument 'lazy_update', Possible Arguments:
----------------
lr : float, required
    Learning rate
momentum : float, optional, default=0
    The decay rate of momentum estimates at each epoch.
wd : float, optional, default=0
    Weight decay augments the objective function with a regularization term that penalizes large weights. The penalty scales with the square of the magnitude of each weight.
rescale_grad : float, optional, default=1
    Rescale gradient to grad = rescale_grad*grad.
clip_gradient : float, optional, default=-1
    Clip gradient to the range of [-clip_gradient, clip_gradient] If clip_gradient <= 0, gradient clipping is turned off. grad = max(min(grad, clip_gradient), -clip_gradient).
, in operator sgd_mom_update(name="", wd="0.0005", momentum="0.9", clip_gradient="0.025", rescale_grad="1.0", lr="0.001", lazy_update="True")

Interestingly, if I use CUDA 10.0 instead with Turi Create 5.6:

!pip install turicreate==5.6
!pip uninstall -y mxnet
!pip install mxnet-cu100==1.4.0.post0

the notebook still fails, but if I immediately uninstall turicreate and mxnet-cu100 and try the above steps for CUDA 8.0 again it works without any issues.

Last time it worked, I tried pip freeze > requirements.txt and then pip install -r requirements.txt after restarting the instance, but I still get the same error as above (unless I try using CUDA 10.0 first). What's going on here? Any advice is appreciated.

Using Turi Create Object Detection with CUDA 8.0 on AWS SageMaker Notebook

Answers (1)

Related Questions