Blankman
Blankman

Reputation: 267010

Batch operations with akka persistance, what options are there?

Say I am using akka persistance and I am backing things like Users.

If there is a nighly job that requires scanning all users, and any user that has expired to mark their object as expired.

In a more traditional setup using sql, you would just do:

update u
  set u.is_expired=1
from users u
where u.expired_at >= getdate()

Now if you did that, your akka persistance would be out of synch and you would have to somehow broadcast to all actors to reload.

Or you would have to send a broadcast to all actors to check if you have expired.

If you have millions of users, what realistic options do you have? If this was a database stored procedure this type of query could be done in a matter of seconds.

Trying to understand how this would be done with akka and akka-persistance.

Upvotes: 0

Views: 159

Answers (1)

Levi Ramsey
Levi Ramsey

Reputation: 20551

There are two readily accessible ways to do things like this with Akka Persistence. Both leverage Persistence Query to query the event stream.

If there's a small number of entities, you can use the currentPersistenceIds() query (this query is supported by all the persistence implementations I'm aware of) to get a stream of the entities which exist at that time (stream throttling and backpressure may come in handy here) and send a command to each entity's associated persistent actor to check for expiration.

After a certain point, it may make sense to have a separate database which maintains a view over the entities mapping entity ID to expiration time. For this, you would likely use the eventsByTag query to get a stream of events tagged with, e.g., "affects-expiration"; a later stage in the stream then updates that database. A batch job can then query that DB and issue expiration commands.

An alternative to a DB would be to have a persistent actor which maintains a set of unexpired entities and their expiration times. This actor could be a singleton or could be sharded in such a way as to be able to consistently determine which particular actor would be maintaining the expiration time for a given entity. It could be updated via either the eventsByTag stream (which is eventually consistent) or by the entity actors themselves (much more strongly consistent, but in general be mindful about not having more consistency than you need).

Upvotes: 1

Related Questions