Salma R
Salma R

Reputation: 204

ImportError: No module named tensorflow_transform.beam

When submitting a Dataflow job to GCP I get this error:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 766, in run
    self._load_main_session(self.local_staging_directory)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 482, in _load_main_session
    pickler.load_session(session_file)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 266, in load_session
    return dill.load_session(file_path)
  File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 402, in load_session
    module = unpickler.load()
  File "/usr/lib/python2.7/pickle.py", line 864, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1139, in load_reduce
    value = func(*args)
  File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 818, in _import_module
    return __import__(import_name)
ImportError: No module named tensorflow_transform

My assumption is that requirements such as tensorflow-transform and apache-beam are pre-installed and it used to work a few months ago.

Upvotes: 3

Views: 2011

Answers (2)

gilgamash
gilgamash

Reputation: 902

Even though this topic is several years old, it is still hot. Hence a comment concerning Salma's answer: Meanwhile, using setup_file as a PipelineOptions argument no longer works. Instead you have to import SetupOptions as well and set the setup file as this:

options.view_as(SetupOptions).setup_file = "/yourPath/setup.py"

Also note that the file name must be "setup.py". Here options is defined above as an Instance of PipelineOptions.

Upvotes: 0

Salma R
Salma R

Reputation: 204

Here is the solution, putting it up here for people who are facing the same issue.

You need to have setup.py file in the same directory as the file you are running, assuming that the file has all the beam steps.

import setuptools

setuptools.setup(
              name='whatever-name',
              version='0.0.1',
              install_requires=[
                  'apache-beam==2.10.0',
                  'tensorflow-transform==0.12.0'
                  ],
              packages=setuptools.find_packages(),
              )

In the python file I had

options = PipelineOptions()

which had to be changed to:

options = PipelineOptions(setup_file="./setup.py")

Upvotes: 6

Related Questions