Reputation: 142
Ignite logs have starvation waringings and stop to provide service:
[12:55:22,080][WARNING][grid-timeout-worker-#71][G] >>> Possible starvation in striped pool.
Thread name: sys-stripe-25-#26
Deadlock: false
Completed: 16272032
Thread [name="sys-stripe-25-#26", id=51, state=WAITING, blockCnt=79, waitCnt=15616666]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.remove0(GridDhtAtomicCache.java:716)
at o.a.i.i.processors.cache.GridCacheAdapter.remove(GridCacheAdapter.java:3084)
at o.a.i.i.processors.cache.GridCacheAdapter.remove(GridCacheAdapter.java:3065)
at o.a.i.i.processors.cache.IgniteCacheProxyImpl.remove(IgniteCacheProxyImpl.java:1131)
at o.a.i.i.processors.cache.GatewayProtectedCacheProxy.remove(GatewayProtectedCacheProxy.java:998)
at com.test.info.TestInfoBasicExecutor.handleCurrentLevel(TestInfoBasicExecutor.java:281)
at com.test.info.TestInfoBasicExecutor$infoEntryProcessor.process(TestInfoBasicExecutor.java:514)
at com.test.info.TestInfoBasicExecutor$infoEntryProcessor.process(TestInfoBasicExecutor.java:453)
at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.runEntryProcessor(GridCacheMapEntry.java:5142)
at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:4550)
at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:4367)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3051)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.access$6200(BPlusTree.java:2945)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1717)
at o.a.i.i.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1600)
at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1199)
at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1357)
at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:345)
at o.a.i.i.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:1767)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2420)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:1883)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1736)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1628)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3055)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:130)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:266)
at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:261)
at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)
at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183)
at o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
at o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1090)
at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:505)
at java.lang.Thread.run(Thread.java:745)
I use invoke to update Cache A, and in the etnryprocessor of cache A, I konw the processor is already invoked wiht a lock, and i just doing update for another cacher base this entry, I have checked the value of Cache A, and based on the value, do update to cache B entries, i.e. put or remove, in my test, put is ok, but for remove it seems the remove cause service hangs:
at com.test.info.TestInfoBasicExecutor.handleCurrentLevel(TestInfoBasicExecutor.java:281)
at com.test.info.TestInfoBasicExecutor$infoEntryProcessor.process(TestInfoBasicExecutor.java:514)
at com.test.info.TestInfoBasicExecutor$infoEntryProcessor.process(TestInfoBasicExecutor.java:453)
Update 0702:
To prevent the starvation, i changed my code:
In Ignite Service A's excute function:
cacheA.invoke(record){ // do process to record
igniteQueue.put(processed_record);
}
In Ignite Service B's excute function:
saved_processed_record = igniteQueue.take();
=================
I have try to use this way to prevent the starvation, It runs smoothly when the old code with starvation(TPS is low), but when i running with high TPS, the "Possible starvation in striped pool" back again,
It seems I use igniteQueue in cache.invoke is also not correct vs. previous cache in cache.invoke
When i want is do process for each record in cache, and then base the processed record to update other caches, but it seems it's not possilbe?
Upvotes: 0
Views: 376
Reputation: 8390
You should avoid doing cache operations within the entry processor, even if those operations belong to other caches. The reason for that is that all these operations will use the same thread pool - this can cause starvation.
Upvotes: 1
Reputation: 736
Striped pool used for processing of the Ignite messages. Looks like for some reason all the threads from this pool is waiting for some operation (remove from cache in your log). It could be related to network problems or removing takes a lot of time (for example you are going to remove all the data).
Could you please attach the thread dump and your test code for investigation?
Upvotes: 1