Reputation: 11
I was concerned about real time stream processing for IOT through GCD pub/sub, Cloud Dataflow and perform analytics through BigQuery.I am seeking help for how to implement this. Here is the architecture for IOT real-time stream processing
Upvotes: 1
Views: 593
Reputation: 952
I'm assuming you mean that you want to stream some sort of data from outside the Google Cloud Platform into BigQuery.
Unless you're transforming the data somehow, I don't think that Data Flow is necessary.
Note, that BigQuery has its own Streaming API so you don't necessarily have to use Pub/Sub to get data into BigQuery.
In any case, these are the steps you should generally follow.
If you just want to put very raw data (no processing) into BQ, then I'd suggest using the first method.
If you actually want to transform the data somehow, then I'd use the second method as it allows you to massage the data first.
However, I'd usually always recommend using the first method, even if you want to transform the data somehow.
That way, you have a data_dump
table (raw data) in your dataset and you can still use DataFlow after that to transform the data and put it back into an aggregated
table.
This gives you maximum flexibility because it allows you to create potentially n
transformed datasets from the single data_dump
table in BQ.
Upvotes: 1