Performance issues with large datasets

Question

Is there any way of filtering the events in a projection associated with a read model by the aggregateId?

In the tests carried out we always receive all registered events. Is it possible to apply filters in a previous stage?

We have 100,000 aggregateId and each id has associated 15,000 events. Unable to filter by aggregateId, our projections have to iterate over all events.

Roman Eremin · Accepted Answer

So you have 100.000 aggregates with 15.000 events each.

You can use ReadModel or ViewModel:

Read Model:

Read model can be seen as a read database for your app. So if you want to store some data about each aggregate, you should insert/update row or entry in some table for each aggregate, see Hacker News example read model code.

It is important to understand that resolve read models are built on demand - on the first query. If you have a lot of events, it may take some time.

Another thing to consider - a newly created resolve app is configured to use in-memory database for read models, so on each app start you will have it rebuilt.

If you have a lot of events, and don't want to wait to wait for read models to build each time you start the app, you have to configure a real database storage for your read models.

Configiuring adapters is not well documented, we'll fix this. Here is what you need to write in the relevant config file for mongoDB:

readModelAdapters: [
  {
    name: 'default',
    module: 'resolve-readmodel-mongo',
    options: {
      url: 'mongodb://127.0.0.1:27017/MyDatabaseName',
    }
  }
]

Since you have a database engine, you can use it for an event store too:

storageAdapter: {
  module: 'resolve-storage-mongo',
  options: {
    url: 'mongodb://127.0.0.1:27017/MyDatabaseName',
    collectionName: 'Events'
  }
}

ViewModel ViewModel is built on the fly during the query. It does not require a storage, but it reads all events for the given aggregateId.

reSolve view models are using snapshots. So if you have 15.000 events for a give aggregate, then on the first request all those events will be applied to calculate a vies state for the first time. After this, this state will be saved, and all subsequent requests will read a snapshot and all later events. By default snapshot is done per 100 events. So on the second query reSolve would read a snapshot for this view model, and apply not more than 100 events to it.

Again, keep in mind, that if you want snapshot storage to be persistent, you should configure a snapshot adapter:

snapshotAdapter: {
  module: 'resolve-snapshot-lite',
  options: {
    pathToFile: 'path/to/file',
    bucketSize: 100
  }
}

ViewModel has one more benefit - if you use resolve-redux middleware on the client, it will be kept up-to-date there, reactively applying events that app is receiving via websockets.

Performance issues with large datasets

Answers (1)

Related Questions