Bill
Bill

Reputation: 3209

Sync GCP PostgreSQL tables to GCP BigQuery

We are currently using Stitchdata to periodically sync data over from a PostgreSQL instance to a BigQuery instance. The data is used for building reports. Stitchdata allows a pipeline to be built with a few checkboxes and options without the need for coding.

I am wondering if Google Cloud Platform offers a solution to allow a PostgreSQL database (hosted with GCP) to be synced to a BigQuery table. This is for the purpose of data sovereignty and user data privacy issues.

We do not want to use federated queries since we only want certain columns to be imported to BigQuery from PostgreSQL.

I have looked at:

I am curious if I am missing anything glaring obvious with what Google Cloud Platform offers to sync data from our PostgreSQL database into BgigQuery.

Cheers

Upvotes: 9

Views: 2132

Answers (2)

Chris Lalande
Chris Lalande

Reputation: 213

I would reconsider looking at federated queries, and for users who shouldn't have access to certain columns, create a IAM role that handles column level access.

Alternatively, federated queries can be used to define a View that excludes the columns that you want to exclude.

The downside to this approach that every user who will have access to query that view must have an identify on and access into all the data sources that define it.

Finally, you could use a Scheduled Query that runs as an identity that does have access to both sources, using that same federated query to define the data to pull. I would only take that approach if it is not practical or appropriate from an IAM approach to use a federated query - if you can avoid replicating data, you both save on storage as well as remove data latency.

Upvotes: 0

Alexander Goida
Alexander Goida

Reputation: 364

GCP Datastream supports now CDC from PostgreSQL to BigQuery. There are currently some issues which we have experienced though. For example, it's not possible to use partitioned tables on BigQuery side, bcs Datastream creates them and we don't have control over this operation. If we are synchronizing a really big tables from PostgreSQL and then building some ETLs, we will scan a lot of data.

Upvotes: 0

Related Questions