eilalan
eilalan

Reputation: 689

Copying tar.gz libraries to apache-beam workers

I hope that you are all well. I would like to copy a tools library into the worker machines and uses the setup.py file. I have update the CUSTOM_COMMANDS:

 CUSTOM_COMMANDS = [
  ["wget", "-O", "/usr/local/sratoolkit.tar.gz","http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz"],
  ["tar", "-xzf", "/usr/local/sratoolkit.tar.gz","-C","/usr/local/"]]

When I am looking for the execution files at the /user/local folder, I can not find the tools that I have copied to the worker. What is the right & easiest way to copy tools libraries into the worker machines? I was able to get the expected behavior of the local runner and now it's a matter of finding the right way to reproduce it for the dataflow runner.

I am using python 3.5 with the latest apache-beam 2.20 and latest dataflow Thanks a lot, eilalan

Upvotes: 0

Views: 165

Answers (1)

Ankur
Ankur

Reputation: 779

The worker machines are setup separately and might not be able download files over the internet.

One way to do it would be uploading the files manually to gcs using gsutil cp -r /mytools gs://my-bucket/mytools and then copy all files as the first custom command using gsutil cp -r gs://my-bucket/mytools /mytools

And then execute the appropriate custom commands.

Upvotes: 1

Related Questions