narendraj9
narendraj9

Reputation: 131

Architecture for tracking a remote value

The problem is as follows: At point A (say a server or a database), we can query/aggregate for a value that takes time (maybe seconds). Once we know the value, we want to be able to send events about changes in the value from point A to point B. B is the remote location that is tracking this value.

So, B queries A for the value once and then consumes a stream of diff events to keep the value at B converging to the right value at A.

The problem is that this stream of messages is persistent (e.g. messages in a Kafka topic). Node B can crash and would need to be restarted and must not apply any diff twice or miss any diff events.

What are the possible alternatives for architecture of this application? Timestamp checks to ignore diff are going to introduce flakiness and it would be very hard to understand the whole system. Moreover, that would be wrong.

If this question isn't appropriate for stackoverflow, it would be great if you can comment and let me know a better place for it.

There are multiple instances of A that are acting independently and updating the value in a shared storage. Versioning diffs with multiple source instances makes it difficult because source instances fire and forget diff events.

To give you an example: Let's say we are tasked with maintaing a priority queue of agents working in a company's support team. These agents are assigned tickets to work on. They resolve tickets. We must faily assign issues to agents. So, we need to maintain the count of issues that are currently assigned to every agent. Each ticket has an assignee. So, to get the count of tickets assigned to an agent, we query the tickets table for rows where the assignee is the agent. Once we have queried the table, we would consume diff events (emitted every time the agent is assigned a new ticket or resolves an existing ticket).

Upvotes: 0

Views: 39

Answers (1)

the4thamigo_uk
the4thamigo_uk

Reputation: 875

You can avoid the issue of not sending diffs twice by hashing (or versioning) the value on which the diff is applied (and sending the hash with the diff). Therefore B should only apply an incoming diff if the hash/version matches the hash/version of B's current value. This way A can freely send the same diff multiple times.

If B is in a state where no incoming diffs match the hash/version of B's current value, B can decide to reacquire the full value from A. Either that or A can periodically broadcast out the full value (perhaps compressed if it is large) to 'rebase' all B's.

It is also worth considering if all this effort is necessary and whether in fact you could simply broadcast the full value (perhaps compressed). If the data is to be distributed widely you could have caching servers in each region (you could also do this with the diff approach as well).

Upvotes: 1

Related Questions