Reputation: 161
I am trying to run a apache beam pipeline with DirectRunner in cloudbuild and by doing that I need to install the requirements for the python script, but I am facing some errors.
This is part of my cloudbuild.yaml
steps:
- name: gcr.io/cloud-builders/gcloud
entrypoint: 'bash'
args: [ '-c', "gcloud secrets versions access latest --secret=env --format='get(payload.data)' | tr '_-' '/+' | base64 -d > .env" ]
id: GetSecretEnv
# - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
# entrypoint: 'bash'
# args: ['-c', 'gcloud config set app/cloud_build_timeout 1600 && gcloud app deploy --quiet tweepy-to-pubsub/app.yaml']
- name: gcr.io/cloud-builders/gcloud
id: Access id_github
entrypoint: 'bash'
args: [ '-c', 'gcloud secrets versions access latest --secret=id_github> /root/.ssh/id_github' ]
volumes:
- name: 'ssh'
path: /root/.ssh
# Set up git with key and domain
- name: 'gcr.io/cloud-builders/git'
id: Set up git with key and domain
entrypoint: 'bash'
args:
- '-c'
- |
chmod 600 /root/.ssh/id_github
cat <<EOF >/root/.ssh/config
Hostname github.com
IdentityFile /root/.ssh/id_github
EOF
ssh-keyscan -t rsa github.com > /root/.ssh/known_hosts
volumes:
- name: 'ssh'
path: /root/.ssh
- name: 'gcr.io/cloud-builders/git'
# Connect to the repository
id: Connect and clone repository
dir: workspace
args:
- clone
- --recurse-submodules
- [email protected]:x/repo.git
volumes:
- name: 'ssh'
path: /root/.ssh
- name: 'gcr.io/$PROJECT_ID/dataflow-python3'
entrypoint: '/bin/bash'
args: [ '-c',
'source /venv/bin/activate' ]
- name: 'gcr.io/$PROJECT_ID/dataflow-python3'
entrypoint: '/bin/bash'
dir: workspace
args: ['pip', 'install','-r', '/dir1/dir2/requirements.txt']
- name: 'gcr.io/$PROJECT_ID/dataflow-python3'
entrypoint: 'python'
dir: workspace
args: [ 'dir1/dir2/script.py',
'--runner=DirectRunner' ]
timeout: "1600s"
Without the step where I install the requirements this works but I need the libs, because I have python error for missing libs, and on the second step (5th actually in the original form of cloud build) the cloud build fails with this error
Step #5: Already have image (with digest): gcr.io/x/dataflow-python3
Step #5: import-im6.q16: unable to open X server `' @ error/import.c/ImportImageCommand/360.
Step #5: import-im6.q16: unable to open X server `' @ error/import.c/ImportImageCommand/360.
Step #5: /usr/local/bin/pip: line 5: from: command not found
Step #5: /usr/local/bin/pip: pip: line 7: syntax error near unexpected token `('
Step #5: /usr/local/bin/pip: pip: line 7: ` sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])'
How do I fix this? I also tried some examples on the internet and it doesn't work
Edit: First I deploy on app engine and then I download the repo in cloud build vm, install requirements and try to run it the python script
Upvotes: 0
Views: 2362
Reputation: 75705
I think that the issue comes from your path definition
'source /venv/bin/activate'
and
'pip', 'install','-r', '/dir1/dir2/requirements.txt'
You use the full path definition and it doesn't work on Cloud Build. The current working directory is /workspace/
. If you use relative path, add simply a dot .
before the path, it should works better.
Or not... Indeed, you have the venv activation in a step, and the pip install in the following step. From one step to another, the runtime environment is offloaded and reloaded with the other container. Thus, your source
command that set up environment variable, disappear in the pip
step.
In addition, your cloud build environment is built for the build and destroy then. You don't need to use venv in this case and you can simplify the 3 last steps like this
- name: 'gcr.io/$PROJECT_ID/dataflow-python3'
entrypoint: '/bin/bash'
args:
- '-c'
- |
pip install -r ./dir1/dir2/requirements.txt
python ./dir1/dir2/script.py --runner=DirectRunner
Upvotes: 2