Reputation: 348
I'd like to sync the contents of a folder in my repository to the GCP Composer dags/
folder in a simple command.
The gcloud composer cli seems to have a command for this however it leaves a warning that support for wildcards is being removed.
>> gcloud composer environments storage dags import \
--source="dir/*" \
--environment={env_name} \
--location={loc}
WARNING: Use of gsutil wildcards is no longer supported in --source. Set the storage/use_gsutil property to get the old behavior back temporarily. However, this property will eventually be removed.
Is there a way to use this command that has the same effect of expanding the contents of dir
into the composer dags/
folder that isn't being deprecated? I've looked into gsutil rsync
but that command makes it very difficult to ignore certain files and directories. GCloud has a nice .gcloudignore
file that handles this for you.
Upvotes: 3
Views: 3093
Reputation: 281
There is a way to do this automatically. You'll you use Cloud Build
and Cloud Repositories
.
First of all, create a repository on Cloud Source Repository, containing your dags and plugins. Add a file called cloudbuild.yaml
, this will be responsible to sync your files with cloud storage.
├── cloudbuild.yaml
├── dags
│ └── airflow_monitoring.py
├── plugins
│ ├── hooks
│ │ └── my_hook.py
│ ├── operators
│ │ └── my_operator.py
│ └── sensors
│ └── my_sensor.py
Inside the cloudbuild.yaml
, put the following:
steps:
- name: ubuntu
args: ['bash', '-c', "echo '$COMMIT_SHA' > REVISION.txt"]
- name: gcr.io/cloud-builders/gsutil
args:
- '-m'
- 'rsync'
- '-d'
- '-r'
- 'dags'
- 'gs://${_GCS_BUCKET}/dags'
- name: gcr.io/cloud-builders/gsutil
args:
- '-m'
- 'rsync'
- '-d'
- '-r'
- 'plugins'
- 'gs://${_GCS_BUCKET}/plugins'
Using rsync
command, you can synchronize the files that have been modified between source and destination.
Now, go to Cloud Build
and create a trigger with the following configuration:
The most important settings here is the Source (Which will be a repository) and Branch. Every push on this branch will trigger the build.
In this second part, two things are import:
1 - The build configuration file (Don't worry about this step if you followed the same folder structure mentioned above. If you have changed the location of the cloudbuild.yaml
file, inform the path where it is located in the repository)
2 - Create a variable called _GCS_BUCKET
containing your Cloud Composer bucket name
Then, just click on create
and now, your files on repository will synchronize with your Cloud Composer bucket every time that you push something to master branch.
Upvotes: 2
Reputation: 7467
Apparently, wildcards is no longer supported in --source
.
Use of gcloud composer
command is probably more robust and of course you don't need to specify the bucket name. So, I used a for loop to import DAGs into the root of DAGs folder. The gcloud command may respect the .gcloudignore file too.
for entry in "$DAG_DIRECTORY"/*; do \
gcloud composer environments storage dags import \
--environment $GOOGLE_CLOUD_COMPOSER_ENVIRONMENT \
--location $GOOGLE_CLOUD_LOCATION \
--project $GOOGLE_CLOUD_PROJECT \
--source "$entry"; \
done
Upvotes: 1
Reputation: 1011
You could use gsutil rsync
gsutil rsync -r -c -x -d local_directory gs://GCS-BUCKET-NAME/dags
This will make sure to sync between the local directory and the days dir every time you run this command. Also, its only updates the files that have been modified between source and destination directory.
Upvotes: 2
Reputation: 2099
The command gcloud composer environments storage dags import
imports from local storage into the cloud composer bucket. It seems that it doesn't synchronizes the source and destination. In the examples, the existing files in the dags/
folder are not deleted, only new files are added.
Given the fact that the gcloud command only copies the source content to the dags/
folder, the gsutil can help:
gsutil cp -r dir/* gs://composer-bucket/dags
Upvotes: 0