Reputation: 5734
I'm working on an app that requires clients to subscribe to some rows of a "Data" DynamoDB table. Clients should receive an initial snapshot, and streaming updates through a WebSocket connection.
What is the most efficient way to do so? Or, more precisely...
My current plan is to
When a subscriber comes in, I plan to
Now of course a client might thereby receive a delta update before it receives the snapshot, but that's not an issue in my case (data is versioned and those conflicts can be managed by the client).
My concern is that by default, step 3 - querying current subscribers - would need to be a strongly consistent database read, otherwise a subscriber might miss out on an update (eg: We send out an initial snapshot. An update comes in, but due to eventual consistency, step 3 doesn't see the new subscriber yet - so they miss out!)
That kind of sucks, because we likely need to query subscribers quite often (every time an update occurs), and having to do consistent reads will slow things down - and make them more expensive from a billing perspective!
Are there any options to improve this?
Ideally, I'd like to insert a step after step 5 (and before step 6) that is "wait until the data has been pushed out to all replicas, so all weak reads after this will pick up the new subscriber". But I don't think that is possible - please do correct me if wrong.
Otherwise, I'm considering adding a timestamp to the Subscribers table. Step 3 could then lodge two separate queries - a weakly consistent read for Subscribers where Timestamp <= now - 10 minutes, and a strongly consistent read for Subscribers where Timestamp > now - 10 minutes. That kind of implies it'd be safe to assume a subscriber that came in longer than 10 minutes ago should now have propagated to all nodes, and every weakly consistent read "should" know about them by now. I don't need to say: This feels VERY dodgy!
I'd be keen to hear better ideas, or thoguhts on how bad my dodgy idea really is.
Upvotes: 0
Views: 533
Reputation: 7132
The time for eventual consistency within the base table is usually counted in single digit milliseconds, maybe up to a couple seconds in the event of something like leader node failure where a new leader must be elected. So wait three seconds before doing your EC scan and you should be comfortable that there were no changes from before the stream listening that your client would miss.
If missing something would truly be catastrophic and you need to protect against the super rare situation where a short pause isn't a sufficient guarantee, then just do strongly consistent reads. That's what they're there for.
Upvotes: 1