Reputation: 113
I need to migrate two of my tables from Google Cloud SQL to Google Bigquery.
The data is about 1 TB in size and about 5 months of stock market tick data.
I understand from the documentation, that I can export data as CSV and load it into Bigquery. I wish to do date-wise partitions in Google Bigquery, as suggested in best practices. Also, I wish to do a robust verification after each date's data is migrated, to be sure that data doesn't get corrupted during transfer.
My question is, how can I write code to do this loop over dates, and within each loop:
1. export from Google Cloud SQL
2. load into Google BigQuery
3. Test that data has not got corrupted in transfer
Upvotes: 4
Views: 3161
Reputation: 510
You can use BigQuery Cloud SQL federated query to copy Cloud SQL table into BigQuery. You can do it with one BigQuery SQL statement. For example, following SQL copy MySQL table sales_20191002 to BigQuery table demo.sales_20191002.
INSERT
demo.sales_20191002 (column1, column2 etc..)
SELECT
*
FROM
EXTERNAL_QUERY(
"project.us.connection",
"SELECT * FROM sales_20191002;");
EXTERNAL_QUERY("connection", "foreign SQL")
would execute the "foreign SQL" in the Cloud SQL database specified in "connection" and return result back to BigQuery. "foreign SQL" is the source database SQL dialect (MySQL or PostgreSQL).
Before running above SQL query, you need to create a BigQuery connection which point to your Cloud SQL database.
Finally, you can validate the data quality by checking some high level statistics of two tables. For example, does number of rows match? does group by result match?
Upvotes: 3
Reputation: 946
Setting aside the fact that I would probably also choose Dataflow for a task like this, I've been working on a python sample that does exactly what Siddharth asked for. I'd expect it to work well enough for small datasets.
It gathers a list of days from a source CloudSQL table. Then, for each day:
1) exports relevant CloudSQL rows into separate CSV files
2) loads each CSV into a separate BQ table and
3) launches a simple verification query both in MySQL and BQ
if __name__ == '__main__':
days = mysql_distinct_days()[:NUM_DAYS_TO_TEST]
do_extract_and_load(days)
do_checks(days)
if DELETE_TABLES:
delete_tables(days)
I've shared a gist with the full source code and output in hopes that it'll prove helpful to somebody down the road. :-)
Upvotes: 0
Reputation: 14791
Here's what I would do:
Upvotes: 1