Bashammakh Yazeed
Bashammakh Yazeed

Reputation: 43

Apache beam Pypi packages downloading forever

I am running apache beam pipeline on dataflow with 3 Pypi packages defined in requirements.txt file. When I am running my pipeline with option "--requirements_file=requirements.txt", it submit below command to download Pypi packages.

python -m pip download --dest /tmp/requirements-cache -r requirements.txt --exists-action i --no-binary :all:

This command takes huge time to download the packages. I tried running it manually as well,it runs forever.

Why apache beam is using --no-binary :all: option, this is the root cause of long duration. Am I doing some mistake or any other way we can decrease the pip download time?

Upvotes: 0

Views: 375

Answers (1)

robertwb
robertwb

Reputation: 5104

This is because the packages need to be installed on the workers, and it doesn't want to download binaries specific to whatever machine you happen to be launching the pipeline from.

If you have a lot of dependencies, the best solution is to use custom containers.

Upvotes: 1

Related Questions