Reputation: 4391
We have an application which is clustered and has a cache at its core. The cache is based on calculated data, build up from the underlying database (which is replicated too). It is used for fast look-up on streaming data, to make routing decisions on the fly.
Since the data on which the cache is calculated, can be changed from any of the nodes in the cluster, we are thinking of making the cache replicated too so that we do not have to listen to replicated changes coming in from other nodes in the system, in a database specific manner.
We have identified ehcache ( infinispan / hazelcast with nearcache being the other contendors ) as the possible cache implementation. So far so good.
What I need to know is how this cache will be repopulated when the individual nodes or the entire application restarts. All the data that a node requires is present in the underlying database. So can each node simply load the data from its database and populate the cache after restarts? How will the cache then bring the cluster nodes to a single state?
I understand that there are disk persisted caches - is that the way to go with ehcache? I did not want to use that as a first option since the single point of all the data is the database and that is the central authority for deciding which is the right data.
Is there a way an app can reload the cache with all the data at restart and have all the caches do the same and sync up the deltas? I kind of think this is a hard thing to do and might not be viable since all the nodes will not have send all their keys back and forth. But wanted to be sure and wanted to know if there are any other strategies to handle this scenario.
Upvotes: 3
Views: 1477
Reputation: 733
I will try to present here possible solution from Infinispan point of view.
In your use case I would suggest to use cluster of Infinispan standalone nodes (client-server access, for example via HotRod) with configured underlying cache store.
https://docs.jboss.org/author/display/ISPN/Infinispan+Server & https://docs.jboss.org/author/display/ISPN/Using+Hot+Rod+Server
Now more specifically to your questions:
What I need to know is how this cache will be repopulated when the individual nodes or the entire application restarts.
That's why I suggest you to use let's say "independent" cluster with remote client-server access. Your caching layer does not depend directly on your application. When application restarts, crash, redeploying... you still have cached data in Infinispan cluster and simply connect to it once your app lives again.
All the data that a node requires is present in the underlying database. So can each node simply load the data from its database and populate the cache after restarts?
In Infinispan, yes. Nodes are able to load data from cache store. Please see this section: https://docs.jboss.org/author/display/ISPN/Cache+Loaders+and+Stores and especially preload configuration element which is there for pre-loading data from store to the cache after cache start.
How will the cache then bring the cluster nodes to a single state?
There is a state transfer in Infinispan. This new implementation (https://community.jboss.org/wiki/Non-BlockingStateTransferV2) even doesn't block your cluster any more during new nodes joining. State transfer simply takes care about your data distribution across the cluster after join/leave changes.
Is there a way an app can reload the cache with all the data at restart and have all the caches do the same and sync up the deltas?
Yes, according to all mentioned above. You can simply use replication so entries are replicated to all nodes and the state is consistent. However, this case is better for heavy-read scenario. In your case, I suppose, distribution mode with respective numberOfOwner should be enough.
Upvotes: 3