Shan
Shan

Reputation: 197

Google Cloud Spanner real time Change Data Capture to PubSub/Kafka through Cloud Data Fusion or Others

I would like to achieve a real time change data capture (log-based preferred) pipeline from Google Cloud Spanner to PubSub/Kafka for my downstream real time applications. Could you please let me know if there is a great and cost-effective way to achieve that? I will appreciate any advice and recommendations.

In addition, for Cloud Data Fusion from google, I noticed that it could achieve real time from mysql/postgresql to cloud spanner, but I did not find the way go from cloud spanner to pubsub/kafka in real time.

Also, I found another two ways, which to be listed here for any comments or suggestions.

  1. Use Debezium, a log-based change data capture Kafka connector from the link https://cloud.google.com/architecture/capturing-change-logs-with-debezium#deploying_debezium_on_gke_on_google_cloud
  2. Create a polling service (which may miss some data) to poll data from cloud spanner from the link: https://cloud.google.com/architecture/deploying-event-sourced-systems-with-cloud-spanner

If you have any suggestion or comment on this, I will be really grateful.

Upvotes: 1

Views: 1097

Answers (3)

Eike
Eike

Reputation: 638

To close the loop here, Kafka and Pub/Sub support has been added for Spanner change streams:

Spanner change streams to Pub/Sub Dataflow template: https://cloud.google.com/dataflow/docs/guides/templates/provided/cloud-spanner-change-streams-to-pubsub

Debezium based Kafka connector to stream changes into topic partitions: https://cloud.google.com/spanner/docs/change-streams/use-kafka https://debezium.io/documentation/reference/stable/connectors/spanner.html

Upvotes: 0

Derek Downey
Derek Downey

Reputation: 1532

Cloud Spanner has a new feature called Change Streams that would allow building a downstream pipeline from Spanner to PubSub/Kafka.

At this time, there's not a pre-packaged Spanner to PubSub/Kafka connector.

The way to read change streams currently is to use the SpannerIO Apache Beam connector that would allow building the pipeline with Dataflow, or also directly querying the API.

Disclaimer: I'm a Developer Advocate that works with the Cloud Spanner team.

Upvotes: 1

Knut Olav Løite
Knut Olav Løite

Reputation: 3532

There's a open source implementation of a polling service for Cloud Spanner that can also automatically push changes to PubSub here: https://github.com/cloudspannerecosystem/spanner-change-watcher

It is however not log-based. It has some inherent limitations:

  • It can miss updates if the same record is updated twice within the polling interval. In that case, only the last value will be reported.
  • It only supports soft deletes.

You could have a look at the samples to see if it is something that might suit your needs at least to some degree: https://github.com/cloudspannerecosystem/spanner-change-watcher/tree/master/samples

Upvotes: 1

Related Questions