Reputation: 6159
I want to set up an automated backup service of my Cloud Firestore database with a microservice built on flask, the command I need to use:
gcloud beta firestore export gs://[BUCKET_NAME]
Thats the command I'd like to run via my App Engine microservice
@app.route('/backup', methods=["GET", "POST"])
def backup():
subprocess.call('gcloud beta firestore export gs://bucket-name --async', shell=True)
return f"Backup process started successfully, you can close this window. {datetime.now(timezone.utc)}"
But it doesn't look like anything is happening, I'm assuming thats because my App Engine instance doesn't have CloudSDK.
Is this something I could do in Cloud Function instead?
Upvotes: 1
Views: 1784
Reputation: 6854
Here's an example app that you can call with the Google App Engine Cron Service. It's based on the node.js example in the docs:
app.yaml
runtime: python37
handlers:
- url: /.*
script: auto
If you already have a default service deployed, add target: cloud-firestore-admin
to create a new service.
requirements.txt
Flask
google-api-python-client
The google-api-python-client simplifies access to the Cloud Firestore REST API.
main.py
import datetime
import os
from googleapiclient.discovery import build
from flask import Flask, request
app = Flask(__name__)
@app.route('/cloud-firestore-export')
def export():
# Deny if not from the GAE Cron Service
assert request.headers['X-Appengine-Cron']
# Deny if outputUriPrefix not set correctly
outputUriPrefix = request.args.get('outputUriPrefix')
assert outputUriPrefix and outputUriPrefix.startswith('gs://')
# Use a timestamp in export file name
timestamp = datetime.datetime.now().strftime('%Y%m%d-%H%M%S')
if not outputUriPrefix.endswith('/'):
# Add a trailing slash if missing
outputUriPrefix += '/' + timestamp
else:
outputUriPrefix += timestamp
if 'collections' in request.args:
collections = request.args.get('collections').split(",")
else:
collections = None
body = {
'collectionIds': collections,
'outputUriPrefix': outputUriPrefix,
}
# Build REST API request for
# https://cloud.google.com/firestore/docs/reference/rest/v1/projects.databases/exportDocuments
project_id = os.environ.get('GOOGLE_CLOUD_PROJECT')
database_name = 'projects/{}/databases/(default)'.format(project_id)
service = build('firestore', 'v1')
service.projects().databases().exportDocuments(name=database_name, body=body).execute()
return 'Operation started'
if __name__ == '__main__':
# This is used when running locally only. When deploying to Google App
# Engine, a webserver process such as Gunicorn will serve the app. This
# can be configured by adding an `entrypoint` to app.yaml.
# Flask's development server will automatically serve static files in
# the "static" directory. See:
# http://flask.pocoo.org/docs/1.0/quickstart/#static-files. Once deployed,
# App Engine itself will serve those files as configured in app.yaml.
app.run(host='127.0.0.1', port=8080, debug=True)
cron.yaml
cron:
- description: "Daily Cloud Firestore Export"
url: /cloud-firestore-export?outputUriPrefix=gs://BUCKET_NAME&collections=COLLECTIONS_LIST
schedule: every 24 hours
If you deployed to a non-default service in app.yaml, add it here, too: target: cloud-firestore-admin
.
Access permissions for App Engine Service account
Once deployed, the app uses the GAE service account to authorize export requests. Make sure your GAE service account has permissions for Cloud Firestore and for your Storage bucket, see:
https://cloud.google.com/firestore/docs/solutions/schedule-export#configure_access_permissions
Upvotes: 4
Reputation: 63
An up-to-date and more convenient way could be like below
PS1. I have just run the code and just copied as is
PS2. Default method inputs might ne confusing. eg '{your-project-prefix}-develop', could be gcp-project-id-develop
or gcp-project-id-staging
that where you will run the code on.
# pylint: disable=missing-module-docstring,missing-function-docstring,import-outside-toplevel
# pylint: disable=too-many-arguments,unused-argument,no-value-for-parameter
import os
import logging
from typing import List
from google.cloud import storage
from google.cloud.firestore_admin_v1.services.firestore_admin import client as admin_client
from google.cloud.firestore_admin_v1.types import firestore_admin
logger = logging.getLogger(__name__)
def create_storage_bucket_if_not_exists(
gcp_project_id: str = '{your-project-prefix}-develop',
bucket_name: str = '{your-project-prefix}-backup-firestore-develop'
):
storage_client = storage.Client(project=gcp_project_id)
bucket = storage_client.bucket(bucket_name)
if not bucket.exists():
bucket.storage_class = 'STANDARD'
storage_client.create_bucket(bucket, location='us')
def get_client(
service_account: str = '{your-service-account-path}'
):
os.environ.unsetenv('GOOGLE_APPLICATION_CREDENTIALS')
os.environ.pop('GOOGLE_APPLICATION_CREDENTIALS', None)
os.environ.setdefault('GOOGLE_APPLICATION_CREDENTIALS', service_account)
firestore_admin_client = admin_client.FirestoreAdminClient()
return firestore_admin_client
def get_database_name(
client: admin_client.FirestoreAdminClient,
gcp_project_id: str = '{your-project-prefix}-develop'
):
return client.database_path(gcp_project_id, '(default)')
def export_documents(
client: admin_client.FirestoreAdminClient,
database_name: str,
collections: List[str] = None,
bucket_name: str = '{your-project-prefix}-backup-firestore-develop',
gcp_project_id: str = '{your-project-prefix}-develop'
):
if collections is None:
collections = []
bucket = f'gs://{bucket_name}'
request = firestore_admin.ExportDocumentsRequest(
name=database_name,
collection_ids=collections,
output_uri_prefix=bucket
)
# it is gonna be finalized in the background - async
operation = client.export_documents(
request=request
)
return operation
def backup():
client = get_client()
database_name = get_database_name(client)
create_storage_bucket_if_not_exists()
export_documents(client, database_name, [])
logger.info('Backup operation has been started!')
if __name__ == '__main__':
logging.basicConfig(level=logging.INFO)
backup()
And here you can see the backup directories under GCS bucket
Upvotes: 1
Reputation: 75735
You can't perform system calls in sandboxed environment (appengine, functions). Moreover you don't know what is installed on the platform, and it's dangerous/not consistent.
You can try with cloud run, or app engine flex. But it's not a real best practice. The best way is to use Python library to perform the same operation programmatically. In any case, the underlayer result will be the same: an API call.
Upvotes: 1