Reputation: 13
As a quick solution to get some images processed in batch within ML Engine I am using the Cloud Storage Python library to download the images.
Unfortunately it seems that when the job is sent to ML Engine, the library import fails with the following stack trace:
Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 1, in <module> from google.cloud import storage ImportError: cannot import name storage
I am pretty sure that the library is included within the ML engine image (it would be weird if it wasn't) so I am at a loss here, the program runs fine locally.
Upvotes: 1
Views: 1086
Reputation: 8389
The container does not contain this package because typically you use TensorFlow's file_io
module, which works with GCS.
Two options. Assuming you already know how to use and/or have code for google.cloud.storage
, you can just add it as a requirement in your setup.py
file (instructions), for example:
from setuptools import find_packages
from setuptools import setup
REQUIRED_PACKAGES = ['google-cloud-storage']
setup(
name='trainer',
version='0.1',
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
include_package_data=True,
description='My trainer application package.'
)
Or, you can use file_io
, which is especially useful if you don't actually need copies of the data but want to read them directly:
import tensorflow as tf
from tensorflow.python.lib.io import file_io
# Copy
file_io.copy("gs://my-bucket/myfiles/*", "/tmp")
# Glob and read
for file in file_io.get_matching_files("gs://my-bucket/myfiles/*"):
with file_io.FileIO(file) as f:
# Do something
Finally, note that if you're using TensorFlow operations, TensorFlow's readers already know how to read from GCS and so there is no need to manually manipulate the files.
Upvotes: 2