Reputation: 204
When submitting a Dataflow job to GCP I get this error:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 766, in run
self._load_main_session(self.local_staging_directory)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 482, in _load_main_session
pickler.load_session(session_file)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 266, in load_session
return dill.load_session(file_path)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 402, in load_session
module = unpickler.load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1139, in load_reduce
value = func(*args)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 818, in _import_module
return __import__(import_name)
ImportError: No module named tensorflow_transform
My assumption is that requirements such as tensorflow-transform and apache-beam are pre-installed and it used to work a few months ago.
Upvotes: 3
Views: 2011
Reputation: 902
Even though this topic is several years old, it is still hot. Hence a comment concerning Salma's answer: Meanwhile, using setup_file as a PipelineOptions argument no longer works. Instead you have to import SetupOptions as well and set the setup file as this:
options.view_as(SetupOptions).setup_file = "/yourPath/setup.py"
Also note that the file name must be "setup.py". Here options is defined above as an Instance of PipelineOptions.
Upvotes: 0
Reputation: 204
Here is the solution, putting it up here for people who are facing the same issue.
You need to have setup.py file in the same directory as the file you are running, assuming that the file has all the beam steps.
import setuptools
setuptools.setup(
name='whatever-name',
version='0.0.1',
install_requires=[
'apache-beam==2.10.0',
'tensorflow-transform==0.12.0'
],
packages=setuptools.find_packages(),
)
In the python file I had
options = PipelineOptions()
which had to be changed to:
options = PipelineOptions(setup_file="./setup.py")
Upvotes: 6