Tensorflow new op in google cloud ml engine

Question

I have created a new Tensorflow op, I have compiled and tested it locally and it works.

I now want to use it with Google Cloud ML engine which requires compiling the op in each of the machines in the cloud.

I have managed to include the C++ files and the compiler script in the distribution using setup.py:

from setuptools import setup, find_package

setup(
    name='trainer',
    version='0.1',
    packages=find_packages(),
    package_data={'CPP': ['*.cc', '*.h', 'compile.sh']},
    description = 'Package discription'
)

Now I have to run compile.sh which contains:

TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')

g++ -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -shared target.cc dependency_1.cc -o taget.so -fPIC -I$TF_INC -I$TF_INC/external/nsync/public -O2

The script can be run from python code using subprocess.Popen() so that is not the issue.

The issue is that I don't know how the directory tree is structured within google cloud and so I don't know where to run this script, or how to access its output later on to make use of the new op.

Guoqing Xu · Accepted Answer

The easiest way is to build the op locally, upload to your GCS bucket, copy it to the VM(container), and use it via tf.load_op_library. You can use the preinstalled gsutil cli or GCS python client to perform the copy.

Back to the original question: When we start a user job, we first install user code package as root, so in case of python 2.7, it's located at /root/.local/lib/python2.7/site-packages/YOUR_PACKAGE_NAME

Tensorflow new op in google cloud ml engine

Answers (2)

Related Questions