What does data look like when using Event Sourcing?

Question

I'm trying to understand how Event Sourcing changes the data architecture of a service. I've been doing a lot of research, but I can't seem to understand how data is supposed to be properly stored with event sourcing.

Let's say I have a service that keeps track of vehicles transporting packages. The current non relational structure for the data model is that each document represents a vehicle, and has many fields representing origin location, destination location, types of packages, amount of packages, status of the vehicle, etc. Normally this gets queried for information to be read to the front end. When changes are made by the user, the appropriate changes are made to this document in order to update this.

With event sourcing, it seems that a snapshot of every event is stored, but there seem to be a few ways to interpret that:

The first is that the multiple versions of the document I described exist, each a new snapshot every time a change is made. Each event would create a new version of this document and alter it. This is the easiest way for me to wrap my head around it, but I believe this to be incorrect.

Another interpretation I have is that each event stores SPECIFIC information about what's been altered in the document. When the vehicle status changes from On Road to Available, for example, an event specifically for vehicle status changes is triggered. Let's say it's called VehicleStatusUpdatedEvent, and contains the Vehicle ID number, the new status, and the timestamp for this event. So this event is stored and is published to a messaging queue. When picked up from the queue, the appropriate changes are made to the current version of the document. I can understand this, but I think I still have some misconceptions here. My understanding is that event sourcing allows us to have a snapshot of data upon each change, so we can know what it looks like at any point. What I just described would keep a log of changes, but still only have one version of the file, as the events only contain specific pieces of the whole file.

Can someone describe how the data flow and architecture works with event sourcing? Using the vehicle data example I provided might help me frame it better. I feel that I am close to understanding this, but I am missing some fundamental pieces that I can't seem to understand by searching online.

VoiceOfUnreason · Accepted Answer

The current non relational structure for the data model is that each document represents a vehicle

OK, let's start from there.

In the data model you've described, storage of a document destroys the earlier copy.

Now imagine that instead we were storing the the document in a git repository. Then then saving the document would also save metadata, and that metadata would include a pointer to the previous document.

Of course, we've probably got a lot of duplication in that case. So instead of storing the complete document every time, we'll store a patch document (think JSON Patch), and metadata pointing to the original patch.

Take that same idea again, but instead of storing generic patch documents, we use domain specific messages that describe what is going on in terms of the model.

That's what the data model of an event sourced entity looks like: a list of domain specific descriptions of document transformations.

When you need to reconstitute the current state, you start with a state you know (which could be the "null" state of the document before anything happened to it, and replay onto that document all of the patches (events) that have occurred since.

If you want to do a temporal query, the game is the same, you replay the events up to the point in time that you are interested in.

So essentially when referring to an older build, you reconstruct the document using the events, correct?

Yes, that's exactly right.

So is there still a "current status" document or is that considered bad practice?

"It depends". In the general case, there is no current status document; only the write-ordered list of events is "real", and everything else is derived from that.

Conversations about event sourcing often lead to consideration of dedicated message stores for managing persistence of those ordered lists, and it is common that the message stores do not also support document storage. So trying to keep a "current version" around would require commits to two different stores.

At this point, designers typically either decide that "recent version" is good enough, in which case they build eventually consistent representations of documents outside of the transaction boundary... OR they decide current version is important, and look into storage solutions that support storing the current version in the same transaction as the events (ex: using an RDBMS).

what is the procedure used to generate the snapshot you want using the events?

IF you want to generate a snapshot, then you'll normally end up using a pattern called a projection, to iterate over the events and either fold or reduce them to create the document.

Roughly, you have a function somewhere that looks like

document-with-meta-data = projection(event-history-with-metadata)

What does data look like when using Event Sourcing?

Answers (1)

Related Questions