Ahmed Serag
Ahmed Serag

Reputation: 21

Data Ingestion Using Big query's python client exceeds cloud functions maximum limit

I am trying to auto-ingest data from gcs into bigquery using a bucket triggered cloud function.the file types are gzipped json files which can have a maximum size of 2gb.the cloud function works fine with small files.however it tends to timeout when i give it large files that range from 1 to 2 gbs.is there a way to further optimize my function here is the code below:

def bigquery_job_trigger(data, context):
    
    # Set up our GCS, and BigQuery clients
    storage_client = storage.Client()
    client = bigquery.Client()

    file_data = data
    file_name = file_data["name"]
    
    
    
    table_id = 'BqJsonIngest'
    bucket_name = file_data["bucket"]
    dataset_id = 'dataDelivery'

    dataset_ref = client.dataset(dataset_id)
    table_ref = dataset_ref.table(table_id)

    job_config = bigquery.LoadJobConfig()
    job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
    job_config.autodetect = True

    blob = storage_client.bucket(bucket_name).get_blob(file_name)
    file = blob.open("rb")
    client.load_table_from_file(
            file,
            table_ref,
            location="US",  # Must match the destination dataset location.
            job_config=job_config,
    )

Upvotes: 0

Views: 233

Answers (1)

shollyman
shollyman

Reputation: 4384

If the file's already in GCS, there's no need to open the blob inside your function (or the need to do so is not apparently from the snippet provided).

See client.load_table_from_uri, or just checkout one of the existing code samples like https://cloud.google.com/bigquery/docs/samples/bigquery-load-table-gcs-csv#bigquery_load_table_gcs_csv-python

Upvotes: 1

Related Questions