Kristin Day
Kristin Day

Reputation: 21

Getting Azure SQL Server data into BigQuery

Thank you in advance for your patience. I am writing a lengthy question to attempt to provide as much relevant info as possible.

My data is stored in Azure SQL Server (not by my choice) and I want to work with the data in Google BigQuery. I would like to update the data in BigQuery from SQL Server periodically (say once an hour or once every few hours for example).

I have found many ways to pull data from SQL Server and many ways to load data into BigQuery. What I've landed on as the easiest solution for now is creating a load job in BigQuery that uses the SQL Server URI. The data in SQL Server has auto modified/created tags that will indicate data that has been updated or added since the last load job.

But, I needed an IP address for BigQuery that I could add to my SQL Server whitelist to allow access to the SQL Server data. In Google documentation, the only way I could find to get an IP address was to set up a ComputeEngine VM (which I have done - and I obtained an IP address for the VM).

My question now is: how do I set up (or is it even possible to set up) the ComputeEngine VM to run the BigQuery load job so that the ComputeEngine IP will be used to request SQL Server? Or, in the alternative, how do I find the IP that will be used by BigQuery to make the request to SQL Server?

If you have any ideas - or another setup that I have not considered, please spell it out for me step-by-step. I am rather new to the industry.

Thank you again for your time and consideration.

Upvotes: 2

Views: 2784

Answers (3)

Anders Elton
Anders Elton

Reputation: 871

Take a look at this blog post for a possible solution https://ael-computas.medium.com/copy-sql-server-data-to-bigquery-without-cdc-c520b408bddf

You do need to move the data from sql server into bigquery, either by batch, or streaming with a debezium solution for example.

at some point bigquery might get query federation into sql server so you can read directly, but at the moment only Mysql and PostgreSQL is supported (https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries)

Upvotes: 2

William Vambenepe
William Vambenepe

Reputation: 150

I'm confused by your statement that "the easiest solution for now is creating a load job in BigQuery that uses the SQL Server URI". As Felipe wrote, BigQuery load jobs won't reach into SQL server. All they do is load data from Google Cloud Storage (GCS).

Using a Google Compute Engine (GCE) VM like you suggest would work. You'd need to run a script on the VM (e.g. bash or Python) which uses the "SQL Server URI" to retrieve data, then saves it to GCS, then starts a BigQuery load job to load the data from GCS.

And yes, you can use the public IP address of that GCE VM to whitelist it in Azure SQL.

If your integration need are more complex, the WePay solution at the link provided by Felipe shows how to use Airflow instead of a simple script. But from the BigQuery side, both solutions are equivalent: something (a simple script, or Airflow) retrieves the data, stores it in GCS, then calls a BigQuery load job.

Upvotes: 1

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

AFAIK BigQuery won't pull data out from other servers (you have to push it to BigQuery instead, or read from a GCS file), but here's how WePay schedules and automates MySQL->BigQuery .

https://wecode.wepay.com/posts/bigquery-wepay

Upvotes: 3

Related Questions