Reputation: 379
I have created a new Tensorflow op, I have compiled and tested it locally and it works.
I now want to use it with Google Cloud ML engine which requires compiling the op in each of the machines in the cloud.
I have managed to include the C++ files and the compiler script in the distribution using setup.py
:
from setuptools import setup, find_package
setup(
name='trainer',
version='0.1',
packages=find_packages(),
package_data={'CPP': ['*.cc', '*.h', 'compile.sh']},
description = 'Package discription'
)
Now I have to run compile.sh
which contains:
TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
g++ -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -shared target.cc dependency_1.cc -o taget.so -fPIC -I$TF_INC -I$TF_INC/external/nsync/public -O2
The script can be run from python code using subprocess.Popen()
so that is not the issue.
The issue is that I don't know how the directory tree is structured within google cloud and so I don't know where to run this script, or how to access its output later on to make use of the new op.
Upvotes: 2
Views: 141
Reputation: 379
Here is my solution.
I used the Extension
module from setup
to build the op dynamically when the package is deployed in the cloud:
import tensorflow as tf
from setuptools import setup, find_packages, Extension
TF_INC = tf.sysconfig.get_include()
TF_LIB = tf.sysconfig.get_lib()
module1 = Extension('op_name',
define_macros=[('_GLIBCXX_USE_CXX11_ABI', 0)],
include_dirs=[TF_INC, TF_INC + '/external/nsync/public', 'Op/cpp'],
sources=['Op/cpp/op_name.cc',
'Op/cpp/source_2.cc'],
extra_compile_args=['-O2', '-std=c++11', '-fPIC', '-shared'],
library_dirs=[TF_LIB],
libraries=['tensorflow_framework'],
extra_link_args=[],
language='c++')
setup(
name='trainer',
version='0.1',
packages=find_packages(),
package_data={'Op': ['cpp/*.cc', 'cpp/*.h']},
ext_modules=[module1],
)
Some notes:
include_dirs
. In this case, I have the header files in the same directory as the source files (Op/cpp
)..h
files are packaged. For that you
must use package_data={'Op': ['cpp/*.cc', 'cpp/*.h']},
so that the
.h
files are included in the manifest. The .cc
files should be included anyway since they are sources, I just have them here because.gcc
which uses c1plus
. Tensorflow official documentation uses g++
. I don+t know what implications this has in terms of performance...Upvotes: 0
Reputation: 482
The easiest way is to build the op locally, upload to your GCS bucket, copy it to the VM(container), and use it via tf.load_op_library. You can use the preinstalled gsutil cli or GCS python client to perform the copy.
Back to the original question: When we start a user job, we first install user code package as root, so in case of python 2.7, it's located at /root/.local/lib/python2.7/site-packages/YOUR_PACKAGE_NAME
Upvotes: 1