Ari
Ari

Reputation: 6159

Automating a Cloud Firestore export in python3

I want to set up an automated backup service of my Cloud Firestore database with a microservice built on flask, the command I need to use:

gcloud beta firestore export gs://[BUCKET_NAME]

Thats the command I'd like to run via my App Engine microservice

@app.route('/backup', methods=["GET", "POST"])
def backup():

    subprocess.call('gcloud beta firestore export gs://bucket-name --async', shell=True)

    return f"Backup process started successfully, you can close this window. {datetime.now(timezone.utc)}"

But it doesn't look like anything is happening, I'm assuming thats because my App Engine instance doesn't have CloudSDK.

Is this something I could do in Cloud Function instead?

Upvotes: 1

Views: 1784

Answers (3)

Juan Lara
Juan Lara

Reputation: 6854

Here's an example app that you can call with the Google App Engine Cron Service. It's based on the node.js example in the docs:

app.yaml

runtime: python37

handlers:
- url: /.*
  script: auto

If you already have a default service deployed, add target: cloud-firestore-admin to create a new service.

requirements.txt

Flask
google-api-python-client

The google-api-python-client simplifies access to the Cloud Firestore REST API.

main.py

import datetime
import os
from googleapiclient.discovery import build

from flask import Flask, request

app = Flask(__name__) 

@app.route('/cloud-firestore-export')
def export():
    # Deny if not from the GAE Cron Service
    assert request.headers['X-Appengine-Cron']
    # Deny if outputUriPrefix not set correctly
    outputUriPrefix = request.args.get('outputUriPrefix')
    assert outputUriPrefix and outputUriPrefix.startswith('gs://')
    # Use a timestamp in export file name
    timestamp = datetime.datetime.now().strftime('%Y%m%d-%H%M%S')
    if not outputUriPrefix.endswith('/'):
      # Add a trailing slash if missing
      outputUriPrefix += '/' + timestamp
    else:
      outputUriPrefix += timestamp
    if 'collections' in request.args:
      collections = request.args.get('collections').split(",")
    else:
      collections = None

    body = {
        'collectionIds': collections,
        'outputUriPrefix': outputUriPrefix,
    }
    # Build REST API request for 
    # https://cloud.google.com/firestore/docs/reference/rest/v1/projects.databases/exportDocuments
    project_id = os.environ.get('GOOGLE_CLOUD_PROJECT')
    database_name = 'projects/{}/databases/(default)'.format(project_id)
    service = build('firestore', 'v1')
    service.projects().databases().exportDocuments(name=database_name, body=body).execute()
    return 'Operation started' 


if __name__ == '__main__':
    # This is used when running locally only. When deploying to Google App
    # Engine, a webserver process such as Gunicorn will serve the app. This
    # can be configured by adding an `entrypoint` to app.yaml.
    # Flask's development server will automatically serve static files in
    # the "static" directory. See:
    # http://flask.pocoo.org/docs/1.0/quickstart/#static-files. Once deployed,
    # App Engine itself will serve those files as configured in app.yaml.
    app.run(host='127.0.0.1', port=8080, debug=True)

cron.yaml

cron:
- description: "Daily Cloud Firestore Export"
  url: /cloud-firestore-export?outputUriPrefix=gs://BUCKET_NAME&collections=COLLECTIONS_LIST
  schedule: every 24 hours

If you deployed to a non-default service in app.yaml, add it here, too: target: cloud-firestore-admin.

Access permissions for App Engine Service account

Once deployed, the app uses the GAE service account to authorize export requests. Make sure your GAE service account has permissions for Cloud Firestore and for your Storage bucket, see:

https://cloud.google.com/firestore/docs/solutions/schedule-export#configure_access_permissions

Upvotes: 4

baranbartu
baranbartu

Reputation: 63

An up-to-date and more convenient way could be like below

PS1. I have just run the code and just copied as is

PS2. Default method inputs might ne confusing. eg '{your-project-prefix}-develop', could be gcp-project-id-develop or gcp-project-id-staging that where you will run the code on.

# pylint: disable=missing-module-docstring,missing-function-docstring,import-outside-toplevel
# pylint: disable=too-many-arguments,unused-argument,no-value-for-parameter
import os
import logging
from typing import List

from google.cloud import storage
from google.cloud.firestore_admin_v1.services.firestore_admin import client as admin_client
from google.cloud.firestore_admin_v1.types import firestore_admin

logger = logging.getLogger(__name__)


def create_storage_bucket_if_not_exists(
    gcp_project_id: str = '{your-project-prefix}-develop',
    bucket_name: str = '{your-project-prefix}-backup-firestore-develop'
):
    storage_client = storage.Client(project=gcp_project_id)
    bucket = storage_client.bucket(bucket_name)
    if not bucket.exists():
        bucket.storage_class = 'STANDARD'
        storage_client.create_bucket(bucket, location='us')


def get_client(
    service_account: str = '{your-service-account-path}'
):
    os.environ.unsetenv('GOOGLE_APPLICATION_CREDENTIALS')
    os.environ.pop('GOOGLE_APPLICATION_CREDENTIALS', None)
    os.environ.setdefault('GOOGLE_APPLICATION_CREDENTIALS', service_account)

    firestore_admin_client = admin_client.FirestoreAdminClient()
    return firestore_admin_client


def get_database_name(
    client: admin_client.FirestoreAdminClient,
    gcp_project_id: str = '{your-project-prefix}-develop'
):
    return client.database_path(gcp_project_id, '(default)')


def export_documents(
    client: admin_client.FirestoreAdminClient,
    database_name: str,
    collections: List[str] = None,
    bucket_name: str = '{your-project-prefix}-backup-firestore-develop',
    gcp_project_id: str = '{your-project-prefix}-develop'
):
    if collections is None:
        collections = []

    bucket = f'gs://{bucket_name}'
    request = firestore_admin.ExportDocumentsRequest(
        name=database_name,
        collection_ids=collections,
        output_uri_prefix=bucket
    )

    # it is gonna be finalized in the background - async
    operation = client.export_documents(
        request=request
    )
    return operation


def backup():
    client = get_client()
    database_name = get_database_name(client)
    create_storage_bucket_if_not_exists()
    export_documents(client, database_name, [])
    logger.info('Backup operation has been started!')


if __name__ == '__main__':
    logging.basicConfig(level=logging.INFO)
    backup()

And here you can see the backup directories under GCS bucket enter image description here

Upvotes: 1

guillaume blaquiere
guillaume blaquiere

Reputation: 75735

You can't perform system calls in sandboxed environment (appengine, functions). Moreover you don't know what is installed on the platform, and it's dangerous/not consistent.

You can try with cloud run, or app engine flex. But it's not a real best practice. The best way is to use Python library to perform the same operation programmatically. In any case, the underlayer result will be the same: an API call.

Upvotes: 1

Related Questions