Ben Theunissen
Ben Theunissen

Reputation: 13

Unable to import Google Cloud Storage library in ML Engine

As a quick solution to get some images processed in batch within ML Engine I am using the Cloud Storage Python library to download the images.

Unfortunately it seems that when the job is sent to ML Engine, the library import fails with the following stack trace:

Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 1, in <module> from google.cloud import storage ImportError: cannot import name storage

I am pretty sure that the library is included within the ML engine image (it would be weird if it wasn't) so I am at a loss here, the program runs fine locally.

Upvotes: 1

Views: 1086

Answers (1)

rhaertel80
rhaertel80

Reputation: 8389

The container does not contain this package because typically you use TensorFlow's file_io module, which works with GCS.

Two options. Assuming you already know how to use and/or have code for google.cloud.storage, you can just add it as a requirement in your setup.py file (instructions), for example:

from setuptools import find_packages
from setuptools import setup

REQUIRED_PACKAGES = ['google-cloud-storage']

setup(
    name='trainer',
    version='0.1',
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
    include_package_data=True,
    description='My trainer application package.'
)

Or, you can use file_io, which is especially useful if you don't actually need copies of the data but want to read them directly:

import tensorflow as tf
from tensorflow.python.lib.io import file_io

# Copy
file_io.copy("gs://my-bucket/myfiles/*", "/tmp")

# Glob and read
for file in file_io.get_matching_files("gs://my-bucket/myfiles/*"):
  with file_io.FileIO(file) as f:
    # Do something

Finally, note that if you're using TensorFlow operations, TensorFlow's readers already know how to read from GCS and so there is no need to manually manipulate the files.

Upvotes: 2

Related Questions