Daniel
Daniel

Reputation: 1072

Eclipse Ditto throws akka exception after deployment on EKS

I'm running Eclipse Ditto v2.5.0 on EKS (helm chart) and after a couple of days the service stops working. It doesn't return any results nor is persisting new things working. I've found the following in the logs:

2022-06-28T08:06:12+02:00 Caused by: akka.stream.RemoteStreamRefActorTerminatedException: [SourceRef-139] Remote partner [Actor[akka://[email protected]:2551/system/Materializers/StreamSupervisor-0/$$q2c-SinkRef-139#-1677314214]] has terminated unexpectedly and no clean completion/failure message was received (possible reasons: network partition or subscription timeout triggered termination of partner). Tearing down.
2022-06-28T08:06:12+02:00   at akka.stream.impl.streamref.SourceRefStageImpl$$anon$1.onTimer(SourceRefImpl.scala:374)
2022-06-28T08:06:12+02:00   at akka.stream.stage.TimerGraphStageLogic.onInternalTimer(GraphStage.scala:1665)
2022-06-28T08:06:12+02:00   at akka.stream.stage.TimerGraphStageLogic.$anonfun$getTimerAsyncCallback$1(GraphStage.scala:1654)
2022-06-28T08:06:12+02:00   at akka.stream.stage.TimerGraphStageLogic.$anonfun$getTimerAsyncCallback$1$adapted(GraphStage.scala:1654)
2022-06-28T08:06:12+02:00   at akka.stream.impl.fusing.GraphInterpreter.runAsyncInput(GraphInterpreter.scala:467)
2022-06-28T08:06:12+02:00   at akka.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:517)
2022-06-28T08:06:12+02:00   at akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:625)
2022-06-28T08:06:12+02:00   at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:800)
2022-06-28T08:06:12+02:00   at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:818)
2022-06-28T08:06:12+02:00   at akka.actor.Actor.aroundReceive(Actor.scala:537)
2022-06-28T08:06:12+02:00   at akka.actor.Actor.aroundReceive$(Actor.scala:535)
2022-06-28T08:06:12+02:00   at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:716)
2022-06-28T08:06:12+02:00   ... 10 common frames omitted
2022-06-28T08:06:12+02:00 2022-06-28 08:06:12,408 ERROR [] o.e.d.i.u.a.ThingsAggregatorProxyActor akka://ditto-cluster/user/gatewayRoot/proxy/aggregatorProxy - [retrieve-thing-response] Upstream failed.
2022-06-28T08:06:12+02:00 akka.stream.RemoteStreamRefActorTerminatedException: [SourceRef-137] Remote partner [Actor[akka://[email protected]:2551/system/Materializers/StreamSupervisor-0/$$m2c-SinkRef-137#934810721]] has terminated unexpectedly and no clean completion/failure message was received (possible reasons: network partition or subscription timeout triggered termination of partner). Tearing down.
2022-06-28T08:06:12+02:00   at akka.stream.impl.streamref.SourceRefStageImpl$$anon$1.onTimer(SourceRefImpl.scala:374)
2022-06-28T08:06:12+02:00   at akka.stream.stage.TimerGraphStageLogic.onInternalTimer(GraphStage.scala:1665)
2022-06-28T08:06:12+02:00   at akka.stream.stage.TimerGraphStageLogic.$anonfun$getTimerAsyncCallback$1(GraphStage.scala:1654)
2022-06-28T08:06:12+02:00   at akka.stream.stage.TimerGraphStageLogic.$anonfun$getTimerAsyncCallback$1$adapted(GraphStage.scala:1654)
2022-06-28T08:06:12+02:00   at akka.stream.impl.fusing.GraphInterpreter.runAsyncInput(GraphInterpreter.scala:467)
2022-06-28T08:06:12+02:00   at akka.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:517)
2022-06-28T08:06:12+02:00   at akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:625)
2022-06-28T08:06:12+02:00   at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:800)
2022-06-28T08:06:12+02:00   at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:818)
2022-06-28T08:06:12+02:00   at akka.actor.Actor.aroundReceive(Actor.scala:537)
2022-06-28T08:06:12+02:00   at akka.actor.Actor.aroundReceive$(Actor.scala:535)
2022-06-28T08:06:12+02:00   at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:716)
2022-06-28T08:06:12+02:00   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
2022-06-28T08:06:12+02:00   at akka.actor.ActorCell.invoke(ActorCell.scala:548)
2022-06-28T08:06:12+02:00   at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:800)
2022-06-28T08:06:12+02:00   at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:818)
2022-06-28T08:06:12+02:00   at akka.actor.Actor.aroundReceive(Actor.scala:537)
2022-06-28T08:06:12+02:00   at akka.actor.Actor.aroundReceive$(Actor.scala:535)
2022-06-28T08:06:12+02:00   at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:716)
2022-06-28T08:06:12+02:00   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
2022-06-28T08:06:12+02:00   at akka.actor.ActorCell.invoke(ActorCell.scala:548)
2022-06-28T08:06:12+02:00   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
2022-06-28T08:06:12+02:00   at akka.dispatch.Mailbox.run(Mailbox.scala:231)
2022-06-28T08:06:12+02:00   at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
2022-06-28T08:06:12+02:00   at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
2022-06-28T08:06:12+02:00   at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
2022-06-28T08:06:12+02:00   at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
2022-06-28T08:06:12+02:00   at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
2022-06-28T08:06:12+02:00   at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
2022-06-28T08:06:12+02:00 2022-06-28 08:06:12,410 ERROR [78dae9eb-4515-4513-9930-3060f7ba9652] o.e.d.g.s.e.a.HttpRequestActor akka://ditto-cluster/user/$Xe - Got <Status.Failure> when a command response was expected: <akka.stream.RemoteStreamRefActorTerminatedException: [SourceRef-137] Remote partner [Actor[akka://[email protected]:2551/system/Materializers/StreamSupervisor-0/$$m2c-SinkRef-137#934810721]] has terminated unexpectedly and no clean completion/failure message was received (possible reasons: network partition or subscription timeout triggered termination of partner). Tearing down.>!
2022-06-28T08:06:12+02:00 java.util.concurrent.CompletionException: akka.stream.RemoteStreamRefActorTerminatedException: [SourceRef-137] Remote partner [Actor[akka://[email protected]:2551/system/Materializers/StreamSupervisor-0/$$m2c-SinkRef-137#934810721]] has terminated unexpectedly and no clean completion/failure message was received (possible reasons: network partition or subscription timeout triggered termination of partner). Tearing down.
2022-06-28T08:06:12+02:00   at org.eclipse.ditto.gateway.service.endpoints.actors.AbstractHttpRequestActor.lambda$getResponseAwaitingBehavior$21(AbstractHttpRequestActor.java:387)
2022-06-28T08:06:12+02:00   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
2022-06-28T08:06:12+02:00   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
2022-06-28T08:06:12+02:00   at scala.PartialFunction.applyOrElse(PartialFunction.scala:214)

and

2022-06-27T16:22:19+02:00 2022-06-27 16:22:19,305 ERROR [] a.m.c.b.i.HttpContactPointBootstrap akka://[email protected]:2551/system/bootstrapCoordinator/contactPointProbe-10-20-68-87.ditto.pod.cluster.local-8558 - Overdue of probing-failure-timeout, stop probing, signaling that it's failed

How can I debug this and determine what the root cause might be?

Upvotes: 1

Views: 180

Answers (1)

Kalin Kostashki
Kalin Kostashki

Reputation: 21

The logs in indicate that you have tried to get several things via HTTP. The gateway service received this error as we see in:

2022-06-28T08:06:12+02:00   at org.eclipse.ditto.gateway.service.endpoints.actors.AbstractHttpRequestActor.lambda$getResponseAwaitingBehavior$21(AbstractHttpRequestActor.java:387)

The ThingsAggregatorProxyActor is used to get the each thing you requested from the things service in your EKS.

  1. I would check the ditto health endpoint.

    Assuming you use a nginx in your EKS you should be able to call it using the devops user under localhost:30080/status/health >>> Source

    If you aren't using nginx just call the gateway pod. For example: gateway:8080/status/health

  2. Check the logs of the things pod as well and also if the pod was restarted or had any kinds of issues.

Upvotes: 2

Related Questions