r.slesarev
r.slesarev

Reputation: 21

How to fix "Tried to associate with unreachable remote address [akka.tcp://actorsystem@address:port]" error?

I deployed 3 lighthouse pods and 3 crawlers pods on my kubernetes got from this example. Right now cluster looks like this:

akka.tcp://[email protected]:5213 | [crawler] | up | 
akka.tcp://[email protected]:5213 | [crawler] | up | 
akka.tcp://[email protected]:4053 | [lighthouse] | up | 
akka.tcp://[email protected]:4053 | [lighthouse] | up | 
akka.tcp://[email protected]:4053 | [lighthouse] | up | 

As you can see, there's no crawler-0.crawler node. Lets look into the nodes' logs.

[WARNING][05/26/2020 10:07:24][Thread 0011][[akka://webcrawler/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fwebcrawler%40lighthouse-1.lighthouse%3A4053-940/endpointWriter#501112873]] AssociationError [akka.tcp://[email protected]:5213] -> akka.tcp://[email protected]:4053: Error [Association failed with akka.tcp://[email protected]:4053] []
[WARNING][05/26/2020 10:07:24][Thread 0009][[akka://webcrawler/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fwebcrawler%40lighthouse-2.lighthouse%3A4053-941/endpointWriter#592338082]] AssociationError [akka.tcp://[email protected]:5213] -> akka.tcp://[email protected]:4053: Error [Association failed with akka.tcp://[email protected]:4053] []
[WARNING][05/26/2020 10:07:24][Thread 0008][remoting] Tried to associate with unreachable remote address [akka.tcp://[email protected]:4053]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [Association failed with akka.tcp://[email protected]:4053] Caused by: [System.AggregateException: One or more errors occurred. (No such device or address) ---> System.Net.Internals.SocketExceptionFactory+ExtendedSocketException: No such device or address
   at System.Net.Dns.InternalGetHostByName(String hostName)
   at System.Net.Dns.ResolveCallback(Object context)
--- End of stack trace from previous location where exception was thrown ---
   at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)
   at System.Net.Dns.EndGetHostEntry(IAsyncResult asyncResult)
   at System.Net.Dns.<>c.b__27_1(IAsyncResult asyncResult)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at Akka.Remote.Transport.DotNetty.DotNettyTransport.ResolveNameAsync(DnsEndPoint address, AddressFamily addressFamily)
   at Akka.Remote.Transport.DotNetty.DotNettyTransport.DnsToIPEndpoint(DnsEndPoint dns)
   at Akka.Remote.Transport.DotNetty.TcpTransport.MapEndpointAsync(EndPoint socketAddress)
   at Akka.Remote.Transport.DotNetty.TcpTransport.AssociateInternal(Address remoteAddress)
   at Akka.Remote.Transport.DotNetty.DotNettyTransport.Associate(Address remoteAddress)
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at Akka.Remote.Transport.ProtocolStateActor.<>c.b__11_54(Task`1 result)
   at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
---> (Inner Exception #0) System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (00000005, 6): No such device or address
   at System.Net.Dns.InternalGetHostByName(String hostName)
   at System.Net.Dns.ResolveCallback(Object context)
--- End of stack trace from previous location where exception was thrown ---
   at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)
   at System.Net.Dns.EndGetHostEntry(IAsyncResult asyncResult)
   at System.Net.Dns.<>c.b__27_1(IAsyncResult asyncResult)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at Akka.Remote.Transport.DotNetty.DotNettyTransport.ResolveNameAsync(DnsEndPoint address, AddressFamily addressFamily)
   at Akka.Remote.Transport.DotNetty.DotNettyTransport.DnsToIPEndpoint(DnsEndPoint dns)
   at Akka.Remote.Transport.DotNetty.TcpTransport.MapEndpointAsync(EndPoint socketAddress)
   at Akka.Remote.Transport.DotNetty.TcpTransport.AssociateInternal(Address remoteAddress)
   at Akka.Remote.Transport.DotNetty.DotNettyTransport.Associate(Address remoteAddress)<---
]  

While this node is spamming such exception other 2 crawlers keep in calm and seem to do nothing.
These are 2 yamls I used deploying services:

apiVersion: v1
kind: Service
metadata:
  name: crawler
  labels:
    app: crawler
spec:
  clusterIP: None
  ports:
  - port: 5213
  selector:
    app: crawler
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: crawler
  labels:
    app: crawler
spec:
  serviceName: crawler
  replicas: 3
  selector:
    matchLabels:
      app: crawler
  template:
    metadata:
      labels:
        app: crawler
    spec:
      terminationGracePeriodSeconds: 35
      containers:
      - name: crawler
        image: myregistry.ru:443/crawler:3
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "pbm 127.0.0.1:9110 cluster leave"]
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: CLUSTER_IP
          value: "$(POD_NAME).crawler"
        - name: CLUSTER_SEEDS
          value: akka.tcp://[email protected]:4053,akka.tcp://[email protected]:4053,akka.tcp://[email protected]:4053
        livenessProbe:
          tcpSocket:
            port: 5213
        ports:
        - containerPort: 5213
          protocol: TCP
apiVersion: v1
kind: Service
metadata:
  name: lighthouse
  labels:
    app: lighthouse
spec:
  clusterIP: None
  ports:
  - port: 4053
  selector:
    app: lighthouse
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: lighthouse
  labels:
    app: lighthouse
spec:
  serviceName: lighthouse
  replicas: 3
  selector:
    matchLabels:
      app: lighthouse
  template:
    metadata:
      labels:
        app: lighthouse
    spec:
      terminationGracePeriodSeconds: 35
      containers:
      - name: lighthouse
        image: myregistry.ru:443/lighthouse:1
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "pbm 127.0.0.1:9110 cluster leave"]
        env:
        - name: ACTORSYSTEM
          value: webcrawler
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: CLUSTER_IP
          value: "$(POD_NAME).lighthouse"
        - name: CLUSTER_SEEDS
          value: akka.tcp://[email protected]:4053,akka.tcp://[email protected]:4053,akka.tcp://[email protected]:4053
        livenessProbe:
          tcpSocket:
            port: 4053
        ports:
        - containerPort: 4053
          protocol: TCP

I assume, if the error above gets fixed everything should work OK. Any ideas how to solve it?

Upvotes: 1

Views: 722

Answers (1)

r.slesarev
r.slesarev

Reputation: 21

Ok. I managed to fix it. One of the kuber node couldn't resolve DNS name. A simple reboot of the node solved the issue.

Upvotes: 1

Related Questions