Love Sharma
Love Sharma

Reputation: 1999

BigQuery: How to autoreload table with new storage JSON files?

I have just created one BigQuery table by linking available JSON files in Google Cloud Storage. But I do not see any option to auto-reload table rows with new files added in Google Cloud Storage folder or bucket.

Currently, I have to go to BigQuery console and then delete & recreate the same table to load new files. But this solution is not scalable for us because we run a cron job on BigQuery API. How to auto-reload data in BigQuery?

Thanks

Upvotes: 2

Views: 2112

Answers (2)

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

With Google Cloud Functions you can automate BigQuery each time you receive a new file:

Codewise, import BigQuery inside package.json:

{
  "dependencies": {
    "@google-cloud/bigquery": "^0.9.6"
  } 
}

And on index.js you can act on the new file in any appropriate way:

var bigQuery = BigQuery({ projectId: 'your-project-id' });

exports.processFile = (event, callback) => {
  console.log('Processing: ' + JSON.stringify(event.data));
  query(event.data);
  callback();
};

var BigQuery = require('@google-cloud/bigquery');

function query(data) {
    const filename = data.name.split('/').pop();
    const full_filename = `gs://${data.bucket}/${data.name}`

    // if you want to run a query:
    query = '...'
    bigQuery.query({
        query: query,
        useLegacySql: false
    });
};

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173028

When you define External Table on top of Files in Google Cloud Storage - you can use wildcard for Source Location, so your table will represent all files that match

Then, when you query such table - you can use _file_name field which will "tell" you which file given row came from

SELECT  
  _file_name AS file,
  *
FROM `yourTable`   

This way - whenever you add new file in GCS - you will get it in table "automatically"

Upvotes: 4

Related Questions