Reputation: 69
I am new to GCP and I want to create a dataflow for my project. Long story short, my devices send data to Pub/Sub and after that, I want to make a prediction using a ML model and then output all of these to BigQuery and a realtime firebase database. I found this article from google(i looked at Stream + Micro-batching but failed to implment it) and this github repository but I really don't know how to run it, if anyone can give me a hand I would be really grateful.
Would it be easier to implement all of these with cloud functions?
Upvotes: 0
Views: 482
Reputation: 75715
There is several ways to address your use case.
First of all, I'm not sure that Dataflow is required. Dataflow is perfect for data transformation, or data comparison as described in the article, but I'm not sure that is your use case. If so, here several proposal (we could dig into one if you want)
This solution is the cheaper because you process the message by micro batch (more efficient in processing time) and you perform a load job into BigQuery (which is free compare to streaming). However, it's not scalable because you keep your data in memory before triggering a Load Job. If you have more and more data, you can reach the memory limit of 2Gb of Cloud Run or Cloud Function. Increasing the scheduler frequency is not an option because you have a quota of 1000 load jobs per day (1 day = 1440 minutes -> Thereby, every minute is not possible).
This solutions is highly scalable, and the most expensive one. I recommend you Cloud Run that allow to process several message concurrently and thus to decrease the billable instance processing time. (I wrote an article on this)
Eventually, the best option is to perform a mix of both if you don't have to process the message as soon as possible: Schedule microbatch to pull the pubsub pull subscription. For each message performs a prediction and stream write to BigQuery (to prevent memory overflow).
If you really need to use Dataflow in your process, please describe more what do you want to achieve for better advice.
In any case, I agree with the comment of @JohnHanley, perform Qwiklabs to have idea on what you can do with the platform!
Upvotes: 3