Reputation: 291
In the book Data Engineering with Google Cloud Platform by Adi Wijaya, to load the data from a sql database to BigQuery, the author always load the data from sql to Google Cloud Storage first, and use it as staging environment, and only after that would he load data to BigQuery
What are the advantage of going through the GCS step and not straight away into BigQuery? In which case would you load directly data from SQL db to BigQuery?
Upvotes: 2
Views: 979
Reputation: 3649
Loading to GCS as multiple CSV files and loading later to BigQuery is faster because the loading will run within the Google Cloud infrastructure instead of going back and forth across different networks. And according to the documentation, BigQuery “does not guarantee data consistency for external data sources,” so opting for this recommendation is for our benefit.
Even other tools working on BigQuery integrations follow this standard. Take for example Skyvia. Its documentation mentions Bulk Import needs a GCS Bucket and it will take care of the writing of CSVs. It will offload this task to the user. From the docs: “Skyvia writes data into multiple temporary CSV files, upload them to Google Cloud Storage and then tells Google BigQuery to import data from these CSV files.“
So, this is a standard thing that Google recommends for toolmakers and integrators.
Upvotes: 0
Reputation: 719
BigQuery doesn't support the SQL format as mentioned in this post to directly load data from Cloud SQL to BigQuery. You can follow the below procedures:
The advantages when loading data from Cloud SQL to Cloud Storage to BigQuery are:
When you load data into BigQuery from Cloud Storage, you are not charged for the load operation, but you do incur charges for storing the data in Cloud Storage.
Additional information, the cost of storing in BigQuery is higher than in Cloud storage. And you are subject to the following limitations when you load data into BigQuery from a Cloud Storage bucket.
To suggest the best strategy, your question needs more information. Still it depends on your use case. And for more information on loading data can be found in the BigQuery documentation.
Upvotes: 5