Reputation: 1069
I created a new python google function that schedule a Query in BigQuery every 10 minutes, I test it and it works.
deployment works fine
testing give this error : Error: memory limit exceeded. logs not available ( but I can see that the Query did run as expected in BigQuery)
using http trigger in cloud scheduler, I got failure with this the error message status: 503, but again, I can see in BigQuery console, it is running as expected
edit : here the code for the function
from google.cloud import bigquery
def load(request):
client = bigquery.Client()
dataset_id = 'yyyyyyyy'
table_id='xxxxxxxxxxx'
job_config = bigquery.QueryJobConfig()
job_config.use_legacy_sql = False
table_ref = client.dataset(dataset_id).table(table_id)
job_config.destination = table_ref
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
sql = """
SELECT * FROM `xxxxxx.datastudio.today_view`;
"""
query_job = client.query(sql,location='asia-northeast1',job_config=job_config)
query_job.result()
print("Job finished.")
Upvotes: 0
Views: 1860
Reputation: 76073
The BigQuery job is asynchronous. Your Cloud Function trigger it and wait up to completion. If the function fail between, it's not a problem, the 2 services aren't correlated.
If you do this by API, when you create a job (a query) you got immediately a JobID. Then you have to poll regularly this job ID to know its status. The client library do exactly the same!
Your out of memory issue come from the result which wait the completion and read the results. Set a page size or a max_result to limit the data returned.
But, you can also don't wait the end and exit immediately (skip the line query_job.result()
). You will save Cloud Functions processing (useless wait) time, and thus money!
Upvotes: 2