gavinest
gavinest

Reputation: 348

Sync directory files to Google Cloud Composer dags/ folder

I'd like to sync the contents of a folder in my repository to the GCP Composer dags/ folder in a simple command.

The gcloud composer cli seems to have a command for this however it leaves a warning that support for wildcards is being removed.

>> gcloud composer environments storage dags import \
      --source="dir/*" \
      --environment={env_name} \
      --location={loc}
WARNING: Use of gsutil wildcards is no longer supported in --source. Set the storage/use_gsutil property to get the old behavior back temporarily. However, this property will eventually be removed.

Is there a way to use this command that has the same effect of expanding the contents of dir into the composer dags/ folder that isn't being deprecated? I've looked into gsutil rsync but that command makes it very difficult to ignore certain files and directories. GCloud has a nice .gcloudignore file that handles this for you.

Upvotes: 3

Views: 3093

Answers (4)

Jonas Ferreira
Jonas Ferreira

Reputation: 281

Automated Solution

There is a way to do this automatically. You'll you use Cloud Build and Cloud Repositories.

First of all, create a repository on Cloud Source Repository, containing your dags and plugins. Add a file called cloudbuild.yaml, this will be responsible to sync your files with cloud storage.

├── cloudbuild.yaml
├── dags
│   └── airflow_monitoring.py
├── plugins
│   ├── hooks
│   │   └── my_hook.py
│   ├── operators
│   │   └── my_operator.py
│   └── sensors
│       └── my_sensor.py

Inside the cloudbuild.yaml, put the following:

steps:
- name: ubuntu
  args: ['bash', '-c', "echo '$COMMIT_SHA' > REVISION.txt"]
- name: gcr.io/cloud-builders/gsutil
  args:
    - '-m'
    - 'rsync'
    - '-d'
    - '-r'
    - 'dags'
    - 'gs://${_GCS_BUCKET}/dags'
- name: gcr.io/cloud-builders/gsutil
  args:
    - '-m'
    - 'rsync'
    - '-d'
    - '-r'
    - 'plugins'
    - 'gs://${_GCS_BUCKET}/plugins'

Using rsync command, you can synchronize the files that have been modified between source and destination.

Now, go to Cloud Build and create a trigger with the following configuration:

The most important settings here is the Source (Which will be a repository) and Branch. Every push on this branch will trigger the build.

settings - part 1

In this second part, two things are import:

1 - The build configuration file (Don't worry about this step if you followed the same folder structure mentioned above. If you have changed the location of the cloudbuild.yaml file, inform the path where it is located in the repository)

2 - Create a variable called _GCS_BUCKET containing your Cloud Composer bucket name

settings - part 2

Then, just click on create and now, your files on repository will synchronize with your Cloud Composer bucket every time that you push something to master branch.

Upvotes: 2

Vahid Hallaji
Vahid Hallaji

Reputation: 7467

Apparently, wildcards is no longer supported in --source.

Use of gcloud composer command is probably more robust and of course you don't need to specify the bucket name. So, I used a for loop to import DAGs into the root of DAGs folder. The gcloud command may respect the .gcloudignore file too.

for entry in "$DAG_DIRECTORY"/*; do \
    gcloud composer environments storage dags import \
    --environment $GOOGLE_CLOUD_COMPOSER_ENVIRONMENT \
    --location $GOOGLE_CLOUD_LOCATION \
    --project $GOOGLE_CLOUD_PROJECT \
    --source "$entry"; \
done

Upvotes: 1

Nandakishore
Nandakishore

Reputation: 1011

You could use gsutil rsync

gsutil rsync -r -c -x  -d local_directory gs://GCS-BUCKET-NAME/dags

This will make sure to sync between the local directory and the days dir every time you run this command. Also, its only updates the files that have been modified between source and destination directory.

Upvotes: 2

rsantiago
rsantiago

Reputation: 2099

The command gcloud composer environments storage dags import imports from local storage into the cloud composer bucket. It seems that it doesn't synchronizes the source and destination. In the examples, the existing files in the dags/ folder are not deleted, only new files are added.

Given the fact that the gcloud command only copies the source content to the dags/ folder, the gsutil can help:

gsutil cp -r dir/* gs://composer-bucket/dags

Upvotes: 0

Related Questions