Yuci
Yuci

Reputation: 30089

Apache Ignite pods restarted with failure: Topology is not initialized

I used Ignite Helm chart stable/ignite version 2.7.6 to setup an Ignite cluster on Kubernetes.

But very soon I will get errors like:

JVM will be halted immediately due to the fail ure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateExce ption: Topology is not initialized: app-profiles]]

And as a result, Ignite Kubernetes pods got restarted, again and again.

The related cache app-profiles is configured like this:

<bean class="org.apache.ignite.configuration.CacheConfiguration">
   <property name="name" value="app-profiles" />
   <property name="cacheMode" value="LOCAL" />
   <property name="onheapCacheEnabled" value="true" />
   <property name="evictionPolicy">
      <bean class="org.apache.ignite.cache.eviction.lru.LruEvictionPolicy">
         <property name="maxSize" value="10000" />
      </bean>
   </property>
   <property name="expiryPolicyFactory">
      <bean id="expiryPolicy" class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
         <constructor-arg>
            <bean class="javax.cache.expiry.Duration">
               <constructor-arg value="SECONDS" />
               <constructor-arg value="43200" />
            </bean>
         </constructor-arg>
      </bean>
   </property>
</bean>

Full stack trace:

[11:56:56,148][SEVERE][client-connector-#60][ClientListenerNioListener] Failed to process client request [req=o.a.i.i.processors.platform.client.cache.ClientCachePutRequest@1d
4811d1]
java.lang.IllegalStateException: Topology is not initialized: app-profiles
        at org.apache.ignite.internal.processors.cache.CacheGroupContext.topology(CacheGroupContext.java:587)
        at org.apache.ignite.internal.processors.cache.GridCacheContext.topology(GridCacheContext.java:882)
        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2179)
        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2157)
        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:845)
        at org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:207)
        at org.apache.ignite.internal.processors.cache.GridCacheUtils.unwindEvicts(GridCacheUtils.java:888)
        at org.apache.ignite.internal.processors.cache.GridCacheGateway.leaveNoLock(GridCacheGateway.java:240)
        at org.apache.ignite.internal.processors.cache.GridCacheGateway.leave(GridCacheGateway.java:225)
        at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onLeave(GatewayProtectedCacheProxy.java:1578)
        at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:823)
        at org.apache.ignite.internal.processors.platform.client.cache.ClientCachePutRequest.process(ClientCachePutRequest.java:43)
        at org.apache.ignite.internal.processors.platform.client.ClientRequestHandler.handle(ClientRequestHandler.java:57)
        at org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:162)
        at org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:45)
        at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onMessageReceived(GridNioFilterChain.java:279)
        at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
        at org.apache.ignite.internal.util.nio.GridNioAsyncNotifyFilter$3.body(GridNioAsyncNotifyFilter.java:97)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at org.apache.ignite.internal.util.worker.GridWorkerPool$1.run(GridWorkerPool.java:70)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[11:56:56,151][SEVERE][ttl-cleanup-worker-#41][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=f
alse, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_
TERMINATION, err=java.lang.IllegalStateException: Topology is not initialized: app-profiles]]
java.lang.IllegalStateException: Topology is not initialized: app-profiles
        at org.apache.ignite.internal.processors.cache.CacheGroupContext.topology(CacheGroupContext.java:587)
        at org.apache.ignite.internal.processors.cache.GridCacheContext.topology(GridCacheContext.java:882)
        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2179)
        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2157)
        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:845)
        at org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:207)
        at org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:139)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)
[11:56:56,152][WARNING][ttl-cleanup-worker-#41][FailureProcessor] No deadlocked threads detected.
[11:56:56,210][WARNING][ttl-cleanup-worker-#41][FailureProcessor] Thread dump at 2020/05/22 11:56:56 GMT
Thread [name="Thread-32", id=708, state=TIMED_WAITING, blockCnt=0, waitCnt=10]
    Lock [object=java.util.concurrent.SynchronousQueue$TransferStack@27a8c714, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
        at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
        at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Thread [name="sys-#650", id=707, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@43cf0185, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Thread [name="sys-#649", id=706, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@43cf0185, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

...

Thread [name="Signal Dispatcher", id=4, state=RUNNABLE, blockCnt=0, waitCnt=0]

Thread [name="Finalizer", id=3, state=WAITING, blockCnt=35, waitCnt=21]
    Lock [object=java.lang.ref.ReferenceQueue$Lock@27fdc7bd, ownerName=null, ownerId=-1]
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)

Thread [name="Reference Handler", id=2, state=WAITING, blockCnt=21, waitCnt=21]
    Lock [object=java.lang.ref.Reference$Lock@2cfe1952, ownerName=null, ownerId=-1]
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

Thread [name="main", id=1, state=WAITING, blockCnt=1, waitCnt=107]
    Lock [object=java.util.concurrent.CountDownLatch$Sync@73b574bf, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at o.a.i.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:334)



[11:56:56,214][SEVERE][ttl-cleanup-worker-#41][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.I
llegalStateException: Topology is not initialized: app-profiles]]

Upvotes: 0

Views: 501

Answers (1)

Yuci
Yuci

Reputation: 30089

It is found the problem is with the cacheMode LOCAL. Somehow, caches in the LOCAL mode cannot get its topology initialised (Ignite version 2.7.6). Simply replace it with PARTITIONED cacheMode, and the problem is gone, like below:

<bean class="org.apache.ignite.configuration.CacheConfiguration">
   <property name="name" value="app-profiles" />
   <!-- <property name="cacheMode" value="LOCAL" /> -->
   <property name="cacheMode" value="PARTITIONED" />
   <property name="onheapCacheEnabled" value="true" />
   <property name="evictionPolicy">
      <bean class="org.apache.ignite.cache.eviction.lru.LruEvictionPolicy">
         <property name="maxSize" value="10000" />
      </bean>
   </property>
   <property name="expiryPolicyFactory">
      <bean id="expiryPolicy" class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
         <constructor-arg>
            <bean class="javax.cache.expiry.Duration">
               <constructor-arg value="SECONDS" />
               <constructor-arg value="43200" />
            </bean>
         </constructor-arg>
      </bean>
   </property>
</bean>

And cacheMode REPLICATED should also work.

Upvotes: 1

Related Questions