Tomáš Šíma
Tomáš Šíma

Reputation: 864

Subscribing to google pub/sub messages via cloud functions versus using dataflow

I have a pubsub topic with roughly 1 message per second published. The message size is around 1kb. I need to get these data realtime both into cloudsql and bigquery. The data are coming at a steady rate and it's crucial that none of them get lost or delayed. Writing them multiple times into destination is not a problem. The size of all the data in database is around 1GB.

What are dis/advantages of using google cloud functions triggered by the topic versus google dataflow to solve this problem?

Upvotes: 1

Views: 166

Answers (1)

guillaume blaquiere
guillaume blaquiere

Reputation: 75735

Dataflow is focused on transformation of the data before loading it into a sink. Streaming pattern of Dataflow (Beam) is very powerful when you want to perform computation of windowed data (aggregate, sum, count,...). If your use case required a steady rate, Dataflow can be a challenge when you deploy a new version of your pipeline (hopefully easily solved if doubled values aren't a problem!)

Cloud Function is the glue of the cloud. In your description, it seems perfectly fit. On the topic, create 2 subscriptions and 2 functions (one on each subscription). One write in BigQuery, the other in CLoud SQL. This parallelisation ensures you the lowest latency in the processing.

Upvotes: 3

Related Questions