Szere Dyeri
Szere Dyeri

Reputation: 15246

Can I use my local dependencies with Dataflow for Python SDK

Dataflow for Python SDK have a --requirements_file option, that can take a standard requirements.txt, and install it on its workers before running. Are there any restriction in using these files? Specifically can I use all pip flags (e.g. --editable_mode or -e) to install my local packages?

Upvotes: 1

Views: 1235

Answers (1)

Szere Dyeri
Szere Dyeri

Reputation: 15246

Dataflow for Python SDK will run pip install -r requirements.txt before starting your workload. It is important that all the items reference in the requirements file are accessible to the worker machines. Dependencies on PyPI, or some other accessible location (e.g. http) will install correctly, local packages (e.g. -e my_package) will not because they will not be accessible by workers.

--extra_package option would allow staging local packages in an accessible way. Instead of listing local packages in the requirements.txt, create a tarball of the local package (e.g. my_package.tar.gz) and use --extra_package option to stage them.

Managing Python Pipeline Dependencies have more details on these options.

Upvotes: 2

Related Questions