Nikhil Suthar
Nikhil Suthar

Reputation: 2431

How to automatically transfer newly added avro data from GCS to BigQuery

I want to schedule the data transfer job between Cloud Storage to BigQuery. I have one application that dumps data continuously to the GCS bucket path (let's say gs://test-bucket/data1/*.avro) that I want to move to BigQuery as soon as the object is created in GCS.

I don't want to migrate all the files available within the folder again and again. I just want to move only the newly added object after the last run in the folder.

BigQuery data transfer service is available that takes Avro files as input but not a folder and it does not provide only newly added objects instead all.

I am new to it so might be missing some functionality, How can I achieve it?

Please note- I want to schedule a job to load data at a certain frequency (every 10 or 15 min), I don't want any solution from a trigger perspective since the number of objects that will be generated will be huge.

Upvotes: 0

Views: 498

Answers (1)

Piotr Klos
Piotr Klos

Reputation: 104

You can use Cloud Function and Storage event trigger. Just launch Cloud Function that loads data into BigQuery when new file arrives. https://cloud.google.com/functions/docs/calling/storage EDIT: If you have more than 1500 loads per day you can workaround with loading using BQ Storage API.

If you do not need superb performance then you can just create an external table on that folder and query it instead loading every file.

Upvotes: 1

Related Questions