amor.fati95
amor.fati95

Reputation: 119

Sink for user activity data stream to build Online ML model

I am writing a consumer that consumes (user activity data, (activityid, userid, timestamp, cta, duration) from Google Pub/Sub and I want to create a sink for this such that I can train my ML model in online fashion.

Since this sink is the source from where I will get the user's last x (say 100) activity, to update the ml model, if I can store the data in user-sharded form (in say a no-sql db, bigtable), retrieval will be easy, but the update operation will be costly, as I will append to the value every time I get the activity event for the user, which type of sink should I consider in this situation?

Upvotes: 1

Views: 63

Answers (1)

amor.fati95
amor.fati95

Reputation: 119

Using the bigtable cell_version, and have set garbage collection such that, saving last 100 cell version, while re-training /updating the ML model, iterating over the historical cell versions.

Will update the final read / write throughput and latencies

Upvotes: 1

Related Questions