Reputation: 260
I'd like to start adventure with EventSourcing. As a playground I have a system that gathers data from set of Sensors organized in Arrays. Each Sensor have a single value like temperature. What I need from this system is
Number of Arrays and Sensors is growing. For each Array I have many readings per second.
Now I wanted to have the Array as an Aggregate with Sensors as it's entities. In this case each Sensor reading update would upgrade Array Aggregate version. That gives > 10M of changes for a month. In this design I can't cut off not old events. I can't think about time required for restoring ReadModels after a year of data.
I think I could store current state as CRUD table and remove Sensor current data from Array. Keep just definition. Then I can use service that will handle the Sensor data stream, check Array "status" and keep Array "status" as separate Aggregate. Service would emit "Sensor data update" event. This event would trigger ReadModel keeping historical data handling 1 month constraint. I will not pollute event store with Sensor readings events. In case of Array "status" I will be able to remove whole past "status" Aggregates from the event store. Arrays would keep only Sensor definitions, so EventStore would be relatively small.
I loose complete history. I can't restore my 1 month signal history ReadModel. I would have to pay additional attention not to break it.
The goal is to learn how to scale EventSourcing / CQRS system. How to handle large EventStore and rebuild damaged or inflate new ReadModels within hours not days.
Does this idea fits into ES / CQRS? (EDIT: is it OK to update RM with event stream not from an Aggregates?)
How to handle issues with growing event store and fixing broken ReadModels?
Thanks!
Upvotes: 0
Views: 2793
Reputation: 57377
Does this idea fits into ES / CQRS?
One of the things that you need to be really careful about, is understanding which information is under the control of your domain model, and which belongs to something outside.
If your sensors are physical devices in the real world, broadcasting readings, then your domain model is not the authority. That sensor data is probably going to be read, validated (ie: no corruption to the messages in transit) and stored. In other words, the sensor measurements are events (past), not commands (imperative). Throw them into a convenient data store.
With that in mind, you need to look carefully at whether your arrays are domain entities (reading in sensor data, and making interesting decisions) or projections (a reorganization of the streams of sensor measurements).
It may be useful to review When to avoid CQRS, by Udi Dahan. One of the things he talks about there is that, when done right, aggregates look like processes.
In short, make sure that you are applying the right tools to your problem.
That said, yes -- if you have enough events that folding them into a projection isn't easy, then it is hard. You have to look at how much budget you have to solve the problem, and start digging into more I/O efficient representations of your events, more memory efficient representations of your events, batching, etc. Trying to find different ways to partition the work among different cores.
LMAX did a pretty good job documenting the lessons they learned in processing high volume message streams; search for information about their architecture.
Upvotes: 1
Reputation: 1451
Aggregates with lots of events
Aggregate is a term for the write side (C in CQRS). Aggregate receives a command, and using its state emits events into event store. Aggregate state is built using events from the event store. So if there are lot of events for the given aggregate, it takes time to build the state.
In order to speed up building a state for an aggregate, CQRS/ES frameworks are using snapshots - this is a serialized aggregate state that is stored for particular aggregate version, so you are building the state not from the beginning of time, but from the latest snapshot. You can store snapshots for, say, every 100 events. And don't forget to rebuild them if your projection function changed. Frameworks such as reSolve are doing this for you transparently.
Your scenario
In your particular case it seems to me that your business logic is trivial, meaning you don't need an aggregate state to calculate anything or to make a decision - there is no business logic, you essentially just store events as they being generated by sensor. So in your custom framework you can just avoid building an aggregate state at write side - just store events as sensor data coming in.
At the read side you would use event stream as usual - upon receiving an event you can store it into Read Model database with necessary categorization or time slots.
If you don't need old data in the ReadModel - you may just skip old events during rebuilding - it should be very fast.
If you don't want to store old event in the event store - you can delete them, but this would not be real event sourcing anymore.
Upvotes: 1