Gabriel
Gabriel

Reputation: 809

Loading Data into BigQuery: Direct Insert from Process vs Process and then loading through Google Drive?

I have a google cloud function that generates files stored on Google Drive. I want to load those files in Big Query.

What are the pros and cons of loading data directly from the function (skipping the file generation, just doing some kind of insert in BigQuery) vs loading from Google Drive?

I am interested in focusing the question not only in terms of technical stuff and costs, but also in terms of data processing methodology.

I think the question could lead to the dilema loading online or more in a batch process.

PS: This may sound a duplicate from this post but is not exactly the same.

Upvotes: 0

Views: 460

Answers (1)

saifuddin778
saifuddin778

Reputation: 7277

Files Available Locally (in Cloud Function)

If the file is generated within the cloud function (within its local environment0, loading it is pretty similar to loading from your local filesystem. Here is what it comes down to:

Cons:

  1. The total file size should be <= 10Mbs. If its a CSV, it should have less than 16k rows.
  2. You cannot export multiple files at once to BQ, and have to iterate over each file to load it individually into BQ.

Pros:

  1. If the file fulfills the above constraints, you will be saving the intermediate local -> GCS upload and can load to BQ directly.

Files Available in Cloud Storage Bucket (GCS)

On the other hand, if you decide to send the locally generated file in the cloud function to GCS and then export it to BQ:

Pros:

  1. You can use wildcard exports to BQ (i.e. export multiple files simultaneously), significantly increasing the overall export speed.
  2. Size limitations for per file are much more relaxed (4GB in case of uncompressed and 5TB in case of compressed).
  3. Overall export is much faster compared to local/cloud function exports.

Cons:

  1. Probably the only downside is that if you want to stream data into BQ table, you cannot directly do it if your file is in a GCS bucket. You can achieve that from a locally available file.

Upvotes: 2

Related Questions