Sink for user activity data stream to build Online ML model

Question

I am writing a consumer that consumes (user activity data, (activityid, userid, timestamp, cta, duration) from Google Pub/Sub and I want to create a sink for this such that I can train my ML model in online fashion.

Since this sink is the source from where I will get the user's last x (say 100) activity, to update the ml model, if I can store the data in user-sharded form (in say a no-sql db, bigtable), retrieval will be easy, but the update operation will be costly, as I will append to the value every time I get the activity event for the user, which type of sink should I consider in this situation?

Sink for user activity data stream to build Online ML model

Answers (1)

Related Questions