cockitos
cockitos

Reputation: 15

Cloud Function with too many insertions in BigQuery

Good morning,

I'm developing a cloud function that when someone deposits a file in a bucket, the function moves it to another bucket, performs an insert in BigQuery (getting info from another table with info from the file) and deletes the file from the original bucket.

The problem is that usually about 100 files are deposited at once and this causes the function to fail because too many insertions are made in BigQuery and this causes that the insert from that file is not inserted and the file is not deleted from the original bucket.

The function do something like this:

async function archive(data, context) {

const file = data; 

console.log(`Detected new or overwritten file 'gs://${file.bucket}/${file.name}'`); // 'finalize' GCS trigger

 const filepath = file.name.split('/');

 const filename = filepath.slice(-1).join('/');
 const filenameWithoutExtension = filename.split('.').slice(0, -1).join('.')
 const filenameData = filenameWithoutExtension.split('_');

 const fileData = { // exemple : 0023_ROMERO_2021_02.xml
    id_file: filenameData[0],
    name: filenameData[1],
    year: filenameData[3],
    month: filenameData[4]
  };

await copyFile(srcBucketName, srcFilename, destBucketName,...)

const [job] = await bigquery.createQueryJob({
    query: `INSERT INTO table1 (....)
    SELECT
      ...
      timestamp('${file.timeCreated}')
    FROM
      table2
    WHERE
      month = '${file.month}'
      and year = '${file.year}'
      and id_file = '${file.id_file}'
      `,
location: 'EU',
  });

console.log(`BigQuery job ${job.id} started.`);

await job.getQueryResults();
console.log(`BigQuery job ${job.id} completed.`);

await deleteFile(file.bucket, file.name);

console.log('Completed');
}

There is a way to solve this problem?

Thanks in advance

Upvotes: 0

Views: 148

Answers (1)

Vibhor Gupta
Vibhor Gupta

Reputation: 699

Bigquery has limitations, which we cannot over come it, but you can customize your to handle it.

Solution Proto type:-

  1. Create a wrapper script/code which will be responsible to list all files landed over GCS- bucket.
  2. Create a loop in wrapper code to process each file sequentially or a bath of Small parallel loading instance. These sequentially/small-parallel instance will call your Function for each file.

Upvotes: 1

Related Questions