user1685095
user1685095

Reputation: 6121

Can I use akka persistence when the actor state is only increase in size?

I'm playing with akka persistence trying to implement a service where my state is a potentially very big (well let's say it won't fit in RAM) list of some entities. Lets say user want all history on all entities to be available. Can I do that in akka persistence?

Right now my actor state looks like that.

case class System(var processes: Map[Long, Process] = Map()) {

  def updated(event: Event): System = event match {
    case ProcessDetectedEvent(time, activitySets, id, processType) =>
      val process = Process(activitySets.coordinates, time, activitySets.channels, id, processType, false)
      copy(processes = processes + (id -> process))

    case ProcessMovedEvent(id, activitySets, time) =>
      val process = Process(activitySets.coordinates, time, activitySets.channels, id, processes(id).processType, false)
      copy(processes = processes + (id -> process))

    case ProcessClosedEvent(time, id) =>
      val currentProcess = processes(id)
      val process = Process(currentProcess.coordinates, time, currentProcess.channels, id, currentProcess.processType, true)
      copy(processes = processes + (id -> process))
    case _ => this
  }

}

As you can see the map of Processes is stored in memory, so the application can run out of memory if the number of processes would be large.

Upvotes: 5

Views: 1256

Answers (3)

acjay
acjay

Reputation: 36571

Perhaps you want to think about whether there are meaningful ways to partition your data set into scopes that do have some reasonable bounds. Then you could represent each scope by a persistent actor, and if you need information that spans your entire store, you would have to have some kind of coordinator for managing the scopes and iterating over them. But depending on how sophisticated this starts getting, at some point, I'd have to wonder if you'd be reinventing map-reduce or Spark.

Upvotes: 0

cdmdotnet
cdmdotnet

Reputation: 1753

I think what you might be looking for (at least it's another option) is snapshots.

When using event sourcing and event reply the generally advised approach is to use a snapshot every so often.

So when you get back your events you get back a snapshot and then the events that took place since that snapshot. This means you have less objects streamed from your event storage (less memory) and less things to process and apply (faster)... but this does come with it's own trade-offs... which I won't discuss here.

This again only covers the most common scenarios. If your eventing handling changes then you may need to rebuild your events... although this would raise some rather serious and interesting questions about how you are building your system.

I haven't looked too closely, but akka might have this notion of snapshoting built into it. If not, there's a learning curve and lots of trial and error on the road ahead as you start hitting all those roads less travelled in ideal approaches but the real world throws at you.

Upvotes: -1

Sreenath Chothar
Sreenath Chothar

Reputation: 173

Akka persistence is used by a stateful actor to recover its internal state when the actor is started, restarted after a JVM crash or by a supervisor, or migrated in a cluster. In this case application/JVM may crash with OutOfMemory exception in the long run. And when this actor restarts, the persistence recovery mechanism would recreate all the Process's information in the map. But again the total memory would be high and the application can again crash while running. So persistence in this case would not be helpful to avoid the application crash unless you persist only a partial list of processes to reduce the memory.

So first you need to figure out a way to solve this memory exception. May be you can try the following options.

  1. Try to increase the JVM heap size during JVM restart after the OutOfMemory exception.
  2. While recovering the state, replay only a selected list of messages so that total memory used would be low but the state is incomplete.

If the list of messages to be replayed during recovery is too large, snapshots can be used to reduce the state recovery time.

Upvotes: 0

Related Questions