arupc
arupc

Reputation: 395

What is the best way to cache large data objects into Hazlecast

We have around 20k merchants data ,size around 3mb If we cache these much data together then hazlecast performance not doing good Please note if we cache all 20k individual then for get all merchants call slowing down as reading each merchant from cache costs high network time. How should we partition these data What will be the partition key What will be the max size per partition

Merchant entity attributed as below Merchant Id , parent merchant id, name , address , contacts, status, type

Merchant id is the unique attribute

Please suggest

Upvotes: 0

Views: 1146

Answers (3)

wildnez
wildnez

Reputation: 1098

I would strongly recommend that you break down your object from 3MB to few 10s of KBs, otherwise you will run into problems that are not particularly related to Hazelcast. For example, fat packets blocking other packets resulting in heavy latency in read/write operations, heavy serialization/deserialization overhead, choked network etc. You have already identified high network time and it is not going to go away without flattening the value object. If yours is read heavy use case then I also suggest to look into NearCache for ultra low latency read operations.

As for partition size, keep it under 100MB, I'd say between 50-100MB per partition. Simple maths will help you:

3mb/object x 20k objects = 60GB
Default partition count = 271
Each partition size = 60,000 MB / 271 = 221MB. 
So increasing the partition count to, lets say, 751 will mean:
60,000 MB / 751 = 80MB.

So you can go with partition count set to 751. To cater to possible increase in future traffic, I'd set the partition count to an even higher number - 881.

Note: Always use a prime number for partition count.

Fyi - in one of the future releases, the default partition count will be changed from 271 to 1999.

Upvotes: 0

Scott M
Scott M

Reputation: 84

Adding to what Mike said, it's not unusual to see Hazelcast maps with millions of entries, so I wouldn't be concerned with the number of entries.

You should structure your map(s) to fit your applications design needs. Doing a 'getAll' on a single map seems inefficient to me. It may make more sense to create multiple maps or use a complex key that allows you to be more selective with entries returned.

Also, you may want to look at indexes. You can index the key and/or value which can really help with performance. Predicates you construct for selections will automatically use any defined indexes.

Upvotes: 1

Mike Yawn
Mike Yawn

Reputation: 886

I wouldn't worry about changing partition key unless you have reason to believe the default partitioning scheme is not giving you a good distribution of keys.

With 20K merchants and 3MB of data per merchant, your total data is around 60GB. How many nodes are you using for your cache, and what memory size per node? Distributing the cache across a larger number of nodes should give you more effective bandwidth.

Make sure you're using an efficient serialization mechanism, the default Java serialization is very inefficient (both in terms of object size and speed to serialize and deserialize); using something like IdentifiedDataSerializable (if Java) or Portable (if using non-Java clients) could help a lot.

Upvotes: 0

Related Questions