Kaisar Bhuiyan
Kaisar Bhuiyan

Reputation: 11

Load batch CSV Files from Cloud Storage to BigQuery and append on same table

I am new to GCP and recently created a bucket on Google Cloud Storage. RAW files are dumping every hour on GCS bucket in every hour in CSV format.

I would like to load all the CSV files from Cloud storage to BigQuery and there will be a scheduling option to load the recent files from Cloud Storage and append the data to the same table on BigQuery.

Please help me to setup this.

Upvotes: 0

Views: 1500

Answers (1)

guillaume blaquiere
guillaume blaquiere

Reputation: 75775

There is many options. But I will present only 2:

  1. You can do nothing and use external table in BigQuery, that means you let the data in Cloud Storage and ask BigQuery to request the data directly from Cloud Storage. You don't duplicate the data (and pay less for storage), but the query are slower (need to load the data from a less performant storage and to parse, on the fly, the CSV) and you process all the file for all queries. You can't use BigQuery advanced feature such as partitioning, clustering and others...
  2. Perform a BigQuery load operation to load all the existing file in a BigQuery table (I recommend to partition the table if you can). For the new file, forget the old school scheduled ingestion process. With cloud, you can be event driven. Catch the event that notify a new file on Cloud Storage and load it directly in BigQuery. You have to write a small Cloud Functions for that, but it's the most efficient and the most recommended pattern. You can find code sample here

Just a warning on the latest solution, you can perform "only" 1500 load job per day and per table (about 1 per minute)

Upvotes: 1

Related Questions