AWS Event-Sourcing implementation

Question

I'm quite a newbe in microservices and Event-Sourcing and I was trying to figure out a way to deploy a whole system on AWS.

As far as I know there are two ways to implement an Event-Driven architecture:

Using AWS Kinesis Data Stream
Using AWS SNS + SQS

So my base strategy is that every command is converted to an event which is stored in DynamoDB and exploit DynamoDB Streams to notify other microservices about a new event. But how? Which of the previous two solutions should I use?

The first one has the advanteges of:

Message ordering
At least one delivery

But the disadvantages are quite problematic:

No built-in autoscaling (you can achieve it using triggers)
No message visibility functionality (apparently, asking to confirm that)
No topic subscription
Very strict read transactions: you can improve it using multiple shards from what I read here you must have a not well defined number of lamdas with different invocation priorities and a not well defined strategy to avoid duplicate processing across multiple instances of the same microservice.

The second one has the advanteges of:

Is completely managed
Very high TPS
Topic subscriptions
Message visibility functionality

Drawbacks:

SQS messages are best-effort ordering, still no idea of what they means. It says "A standard queue makes a best effort to preserve the order of messages, but more than one copy of a message might be delivered out of order". Does it means that giving n copies of a message the first copy is delivered in order while the others are delivered unordered compared to the other messages' copies? Or "more that one" could be "all"?

A very big thanks for every kind of advice!

VoiceOfUnreason · Accepted Answer

I'm quite a newbe in microservices and Event-Sourcing

Review Greg Young's talk Polygot Data for more insight into what follows.

Sharing events across service boundaries has two basic approaches - a push model and a pull model. For subscribers that care about the ordering of events, a pull model is "simpler" to maintain.

The basic idea being that each subscriber tracks its own high water mark for how many events in a stream it has processed, and queries an ordered representation of the event list to get updates.

In AWS, you would normally get this representation by querying the authoritative service for the updated event list (the implementation of which could include paging). The service might provide the list of events by querying dynamodb directly, or by getting the most recent key from DynamoDB, and then looking up cached representations of the events in S3.

In this approach, the "events" that are being pushed out of the system are really just notifications, allowing the subscribers to reduce the latency between the write into Dynamo and their own read.

I would normally reach for SNS (fan-out) for broadcasting notifications. Consumers that need bookkeeping support for which notifications they have handled would use SQS. But the primary channel for communicating the ordered events is pull.

I myself haven't looked hard at Kinesis - there's some general discussion in earlier questions -- but I think Kevin Sookocheff is onto something when he writes

...if you dig a little deeper you will find that Kinesis is well suited for a very particular use case, and if your application doesn’t fit this use case, Kinesis may be a lot more trouble than it’s worth.

Kinesis’ primary use case is collecting, storing and processing real-time continuous data streams. Data streams are data that are generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in small sizes (order of Kilobytes).

Another thing: the fact that I'm accessing data from another 
microservice stream is an anti-pattern, isn't it?

Well, part of the point of dividing a system into microservices is to reduce the coupling between the capabilities of the system. Accessing data across the microservice boundaries increases the coupling. So there's some tension there.

But basically if I'm using a pull model I need to read 
data from other microservices' stream. Is it avoidable?

If you query the service you need for the information, rather than digging it out of the stream yourself, you reduce the coupling -- much like asking a service for data rather than reaching into an RDBMS and querying the tables yourself.

If you can avoid sharing the information between services at all, then you get even less coupling.

(Naive example: order fulfillment needs to know when an order has been paid for; so it needs a correlation id when the payment is made, but it doesn't need any of the other billing details.)

AWS Event-Sourcing implementation

Answers (1)

Related Questions