Reputation: 1028

ArgoCD: transport: Error while dialing dial tcp: lookup argocd-repo-server

I am configuring ArgoCD, and all the pods are Running state like below.

$ kubectl get pods -n argocd -o wide
NAME                              READY   STATUS    RESTARTS   AGE    IP               NODE        NOMINATED NODE   READINESS GATES
argocd-application-controller     1/1     Running   0          138m   172.16.195.218   worker-1    <none>           <none>
argocd-applicationset-controller  1/1     Running   0          138m   172.16.195.216   worker-1    <none>           <none>
argocd-dex-server                 1/1     Running   0          138m   172.16.59.213    worker-2    <none>           <none>
argocd-notifications-controlle    1/1     Running   0          138m   172.16.195.217   worker-1    <none>           <none>
argocd-redis                      1/1     Running   0          138m   172.16.59.214    worker-2    <none>           <none>
argocd-repo-server                1/1     Running   0          46m    172.16.59.216    worker-2    <none>           <none>
argocd-server                     1/1     Running   0          138m   172.16.59.215    worker-2    <none>           <none>

But when I create a new app, ArgoCD shows the following error.

Unable to create application: application spec for test is invalid: InvalidSpecError: repository not accessible: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup argocd-repo-server on 10.96.0.10:53: read udp 172.16.59.215:50498->10.96.0.10:53: i/o timeout"

This error occurs when not only using private git repository but also public github repository. And curl to git repository from worker-2 node is OK.

Maybe it seems that the connection from argocd-server to argocd-repo-server is timed out. But I cannot understand why this problem occurs.

My Environment:

Rocky Linux 8.5
kubeadm (1 Master Node + 2 Worker Nodes)
Calico as CNI

Upvotes: 12

Answers (7)

Oleksandr K.

Reputation: 1

Rebooting dns pods helped for me. You may found all related namespaces and pods inside DNS cluster operator.

Cluster Operator DNS

Openshift container platform v4.16 has two namespaces: openshift-dns-operator and openshift-dns.

Rebooting pods in those namespaces helped me.

openshift-dns-operator

openshift-dns

Upvotes: 0

D. Brakes

Reputation: 11

For anyone else landing here and is stuck: Check that your argocd-repo-server service still exists. Argo becomes completely non-functional if this service is not present.

Additionally, since Argo can't determine what's missing, it cannot re-create the service itself.

You will have to recreate the service manually, which may mean helm template'ing or running kustomize build . to render the manifest for the service and re-applying it.

In my case, mine looked like this:

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: repo-server
    app.kubernetes.io/name: argocd-repo-server
    app.kubernetes.io/part-of: argocd
  name: argocd-repo-server
  namespace: argocd
spec:
  ports:
  - name: server
    port: 8081
    protocol: TCP
    targetPort: 8081
  - name: metrics
    port: 8084
    protocol: TCP
    targetPort: 8084
  selector:
    app.kubernetes.io/name: argocd-repo-server

Upvotes: 0

WannaGetHigh

Reputation: 3924

For those arriving here because they had an issue configuring ArgoCD provider on Terraform, you need to set the server_addr like this :

wrong : https:my-argo-cd-server | https:my-argo-cd-server:443
good : my-argo-cd-server:443

Of course set the right port if yours is set on another one than the default https one.

Upvotes: 0

ssaid

Reputation: 75

I was having the same problem, I switched to the namespace of argocd and issued kubectl rollout restart deployment to restart all pods and that solved the problem.

Upvotes: 1

Ron

Reputation: 6755

I had the same issue and after hours debugging, I found it is because I installed core-dns after I install argocd.

So I just reset the whole cluster, and installed core-dns firstly then argocd, the issue was gone.

Upvotes: 3

mdraevich

Reputation: 408

According to your logs you've probably faced IP connectivity issue with DNS server. Apparently, due to inability to resolve domain names argocd-server cannot initiate connection to argocd-repo-server.

A general plan to you how to troubleshoot such issues:

be sure that your DNS pod is up & running in your K8s cluster.
be sure your pod has IP connectivity with K8s DNS server.
be sure your pod has access to UDP/53 of K8s DNS server.
be sure DNS entry that your pod's asking for is resolved, so argocd-repo-server is resolved to an IP address.

See that pretty explained guide for more details - kubernetes.io.

Upvotes: 7

pebkm

Reputation: 1

I don't have calico based config but had the same issue when I started messing up with the argo server service.

With a clean setup, I used NodePort configuration for server instead of LoadBalancer. Without further tunnelling the portal together with argo cli, both worked with the registry connection. Would recommend to try the same.

Upvotes: 0

ArgoCD: transport: Error while dialing dial tcp: lookup argocd-repo-server

Answers (7)

Related Questions