Himanshu Trilokani
Himanshu Trilokani

Reputation: 11

Real time stream processing for IOT through Google Cloud Platform

I was concerned about real time stream processing for IOT through GCD pub/sub, Cloud Dataflow and perform analytics through BigQuery.I am seeking help for how to implement this. Here is the architecture for IOT real-time stream processing

Upvotes: 1

Views: 593

Answers (1)

Kyle O'Brien
Kyle O'Brien

Reputation: 952

I'm assuming you mean that you want to stream some sort of data from outside the Google Cloud Platform into BigQuery.

Unless you're transforming the data somehow, I don't think that Data Flow is necessary.

Note, that BigQuery has its own Streaming API so you don't necessarily have to use Pub/Sub to get data into BigQuery.

In any case, these are the steps you should generally follow.

Method 1

  1. Issue a service account (and download the .json file from IAM on Google Console)
  2. Write your application to get the data you want to stream in
  3. Inside that application, use the service account to stream directly into a BQ dataset and table
  4. Analyse the data on the BigQuery console (https://bigquery.cloud.google.com)

Method 2

  1. Setup PubSub queue
  2. Write an application that collections the information you want to stream in
  3. Push to PubSub
  4. Configure DataFlow to pull from PubSub, transform the data however you need to and push to BigQuery
  5. Analyse the data on the BigQuery console as above.

Raw Data

If you just want to put very raw data (no processing) into BQ, then I'd suggest using the first method.

Semi Processed / Processed Data

If you actually want to transform the data somehow, then I'd use the second method as it allows you to massage the data first.

Try to always use Method 1

However, I'd usually always recommend using the first method, even if you want to transform the data somehow.

That way, you have a data_dump table (raw data) in your dataset and you can still use DataFlow after that to transform the data and put it back into an aggregated table.

This gives you maximum flexibility because it allows you to create potentially n transformed datasets from the single data_dump table in BQ.

Upvotes: 1

Related Questions