Reputation: 1
Code is working when using option DirectRunner. But getting import errors when switching it to DataflowRunner. lxml module is not found is the reason. When trying to use setuptools code along with the main code, its still not working ( --setup_file setup.py).
setuptools.setup(
name='lxml',
version='4.2.5',
install_requires=[],
packages= setuptools.find_packages(),
)
Error: ImportError: No module named lxml [while running 'Run Query']
Any help/suggestions to overcome this error? Thanks.
Upvotes: 0
Views: 165
Reputation: 715
The name
you pass to the setuptools.setup
function is the name of your package, and its dependencies should be specified in the argument install_requires
. I would imagine it works with the DirectRunner
because the package is installed on your local machine.
The Beam juliaset example provides a sample setup.py file:
REQUIRED_PACKAGES = ['numpy']
setuptools.setup(
name='juliaset', # this is their package name
version='0.0.1',
description='Julia set workflow package.',
install_requires=REQUIRED_PACKAGES,
...)
If lxml
is your only dependency, or all your dependencies are on PyPI, you should be able to use the much simpler requirements.txt file. In general, the setup.py approach requires much more boilerplate.
To use requirements.txt, freeze your dependencies:
pip freeze > requirements.txt
And pass the requirements.txt file to your pipeline:
--requirements_file requirements.txt
See also the Beam documentation's page for various dependency patterns for Python.
Upvotes: 1