Event Sourcing: concurrently creating conflicting events

Question

I am trying to implement an Event Sourcing system using Kafka and have run into the following issue. During a new user sign-up I want to check if the username the user provided is already taken. However, consider the case where 2 users are trying to sign-up at the same time providing the same username.

In my understanding of how ES works the controller that processes the sign-up request will check if the request is valid, it will then send a new event (e.g. NewUser) to Kafka, and finally that event will be picked up by another controller which will persist it in a materialized view (e.g. Postgres DB). The problem is that the validation of the request is done against the materialized view but the actual persistence to it happens later. So because the 2 requests are being processed in parallel (by different service instances) they might both pass the validation, resulting in 2 NewUser messages. However, when the second controller tries to persist those 2 NewUser messages in the database saving the second event will fail because of the violation of the uniqueness constraint for the username.

Any ideas on how to address this?

Thanks.

UPDATE:

In particular, I would like to verify whether the following are accepted approaches to the problem:

use the username as the userId (restrictive)
send an event to a topic partitioned by username and when validation is done send an event to another topic

gorkem · Accepted Answer

Initial validation against the materialized view won't be enough in most scenarios where you have constraints. There can always be some relevant events haven't been materialized yet. There are two main concurrency control approaches to ensure that correct results are generated:

1. Pessimistic approach: If you want to validate constraints before you publish an event, you need to lock relevant resources (entity, aggregate or data set). The locking means your services must not be able to publish events on these resources. After this point, to get the current state of your data:

You can wait until all events published before locking are materialized.
You can read current state from the database and apply events on it in a separate process.

2. Optimistic approach: In this approach, you perform your validations after publishing events. To achieve this, you need to implement a feedback mechanism. The process which consumes events and performs validations should be able to publish validation results. You can perform the validations in-memory when possible. Otherwise, you can rely on your materialized data store.

Martin Kleppman talks about a two-step solution for exactly the same problem here and in his book. In this solution, there are two topics: "claims" and "registrations". First, you publish a claim to take the username, then try to write it to the database, and finally publish the result to the registrations topic. At conceptual level, it follows the same steps in the second approach you have mentioned. In validation step, it avoids implementing validation logic and keeping secondary indexes in memory by relying on the database.

Event Sourcing: concurrently creating conflicting events

UPDATE:

Answers (2)

Related Questions