Reputation: 1313

Understanding Akka cluster sharding

I'm learning Akka sharding module. There is something I don't understand abour Sharding. Let's imagine you want to shard an actor : you have many entites from the same actor distribued on many nodes. Each entity can have its own state, which may differ from another entity.

A client is making a request (sending a message) to your shard actor to get back its status value. This is message is going to be processed by an entity and giving back its value as a result. But if it were treated by another entity the result would be different. But it should be the same because all entites derive from the same actor, shouldn'it ?

Upvotes: 2

Answers (3)

Yuriy Tumakha

Reputation: 1570

In Akka cluster sharding each actor should have a unique name(usually entity id) and represent a unique entity. When an actor starting/restarting entity loaded (usually from database) into actor state.

If an actor receives messages to update the entity then the actor should update database and actor state, if an actor receives messages to read entity then the actor should read entity from actor state only (it is guaranteed to be the same as in database as all update operations handled by only one actor).

If any node failed or in case of cluster scaling then actor corresponding for the requested entity can be recreated on another node, shard region.

Upvotes: 0

yiksanchan

Reputation: 1940

It seems you misunderstand the concept of Akka cluster sharding, let me explain with an example.

Let's say your service is responsible for responding with user profiles to requests. And to gain extremely low latency, you decide to use Akka actors to cache the user profiles in memory rather than having to query DB per request.

If your website only has 10 users and each user profile is just a few KB, you can hold all 10 user profiles in a single actor without issue, and you won't need cluster sharding for sure. However, if you have 10 million users, probably the 10 million user profiles won't fit into a single actor's memory, also it'd expensive if the actor goes down, as it means you need a large DB query to gain these data back from persistence.

In this scenario, cluster sharding is a fit. You will have 10 million Akka actors, distributed across your cluster, and each actor stores only 1 user profile. So GetUserProfile(userProfileId = 123) won't give you different response - it will always be routed to THE actor that holds user profile for the user 123, thus the response will always be the same.

How does the routing work? Check extractShardId and extractEntityId in the doc

Upvotes: 9

David Ogren

Reputation: 4810

But [the message response] should be the same because all entites derive from the same actor, shouldn'it ?

No, every actor has their own state and represents something different. If you had a Customer class, you wouldn't expect each Customer object to have the same data. Every customer object would have it's own name, id, etc.

The same is true for Actors. Actors have their own state and represent some kind of domain entity. If you send a GetCustomerName message to an actor, you would expect each Actor to give you a different name.

This is especially true for Cluster Sharding. The point of Cluster Sharding is so that you can scale past a single node: either for scalability, elasticity, or resilience. But they are still Actors each with its own state. Sending a GetCustomerName will (and should) give you a different response from every different actor. Sharding just gives you the ability to distribute those actors across multiple machines and have the location of the actor be transparent to the sender.

Upvotes: 0

Understanding Akka cluster sharding

Answers (3)

Related Questions