Soumitri Pattnaik
Soumitri Pattnaik

Reputation: 3556

Pros and Cons of Google Dataflow VS Cloud Run while pulling data from HTTP endpoint

This is a design approach question where we are trying to pick the best option between Apache Beam / Google Dataflow and Cloud Run to pull data from HTTP endpoints (source) and put them down the stream to Google BigQuery (sink).

Traditionally we have implemented similar functionalities using Google Dataflow where the sources are files in the Google Storage bucket or messages in Google PubSub, etc. In those cases, the data arrived in a 'push' fashion so it makes much more sense to use a streaming Dataflow job.

However, in the new requirement, since the data is fetched periodically from an HTTP endpoint, it sounds reasonable to use a Cloud Run spinning up on schedule.

So I want to gather pros and cons of going with either of these approaches, so that we can make a sensible design for this.

Upvotes: 1

Views: 1736

Answers (1)

al-dann
al-dann

Reputation: 2725

I am not sure this question is appropriate for SO, as it opens a big discussion with different opinions, without clear context, scope, functional and non functional requirements, time and finance restrictions including CAPEX/OPEX, who and how is going to support the solution in BAU after commissioning, etc.

In my personal experience - I developed a few dozens of similar pipelines using various combinations of cloud functions, pubsub topics, cloud storage, firestore (for the pipeline process state managemet) and so on. Sometimes with the dataflow as well (embedded into the pipelieines); but never used the cloud run. But my knowledge and experience may be not relevant in your case.

The only thing I might suggest - try to priorities your requirements (in a whole solution lifecycle context) and then design the solution based on those priorities. I know - it is a trivial idea, sorry to disappoint you.

Upvotes: 1

Related Questions