Lesha Pipiev
Lesha Pipiev

Reputation: 3333

AWS EKS: Ingress load balancer doesn't respond

There is Spark core as service up and running on port 7077 in AWS EKS cluster.

> kubectl get service spark-core -n fargate-profile-selector

NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
spark-core   ClusterIP   10.100.142.199   <none>        7077/TCP,8080/TCP,6066/TCP   172m


> kubectl get pods -n fargate-profile-selector

NAME                                       READY   STATUS    RESTARTS   AGE
spark-master-controller-7f5dd4bf84-twpwb   1/1     Running   0          173m
spark-ui-proxy-c7kd9                       1/1     Running   0          173m
spark-worker-controller-6ccc46994f-5kwkp   1/1     Running   0          173m
spark-worker-controller-6ccc46994f-6gjng   1/1     Running   0          173m
spark-worker-controller-6ccc46994f-8x98q   1/1     Running   0          173m
spark-worker-controller-6ccc46994f-dn2hw   1/1     Running   0          173m
spark-worker-controller-6ccc46994f-xsbkv   1/1     Running   0          173m

> kubectl describe pod spark-master-controller-7f5dd4bf84-twpwb -n fargate-profile-selector

Name:                 spark-master-controller-7f5dd4bf84-twpwb
Namespace:            fargate-profile-selector
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 fargate-ip-172-31-4-37.us-west-2.compute.internal/172.31.4.37
Start Time:           Sun, 30 Oct 2022 21:10:40 +0200
Labels:               app=spark
                      chart=spark-0.0.1-366
                      component=spark-core
                      eks.amazonaws.com/fargate-profile=fargateprofile
                      heritage=Helm
                      pod-template-hash=7f5dd4bf84
                      release=spark
                      sdr.appname=spark
Annotations:          CapacityProvisioned: 0.25vCPU 2GB
                      Logging: LoggingDisabled: LOGGING_CONFIGMAP_NOT_FOUND
                      kubernetes.io/psp: eks.privileged
Status:               Running
IP:                   172.31.4.37
IPs:
  IP:           172.31.4.37
Controlled By:  ReplicaSet/spark-master-controller-7f5dd4bf84
Containers:
  spark-core:
    Container ID:  containerd://6bcb2a37b1fe1cfec0dfa0c23115a87889fc13b078553473d148512516d6ec8e
    Image:        <acc-id>.dkr.ecr.us-west-2.amazonaws.com/spark:3.0.1-dev-18
    Image ID:      <acc-id>.dkr.ecr.us-west-2.amazonaws.com/spark@sha256:7eb77fe90b97ee9da9e369df9a6795bfd32839c343678298dc5b84ee7ea7083d
    Ports:         7077/TCP, 8080/TCP, 6066/TCP, 40000/TCP, 40100/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      /opt/spark/bin/spark-class
    Args:
      org.apache.spark.deploy.master.Master
      --ip
      spark-core
      --port
      7077
      --webui-port
      8080
      --properties-file
      /opt/spark/conf/spark.conf

> kubectl describe ingress spark-core-ingress -n fargate-profile-selector                                            
Name:             spark-core-ingress
Labels:           <none>
Namespace:        fargate-profile-selector
Address:          a17976c980f984f6d998122f02f8109d-eb9d66d6481e7784.elb.us-west-2.amazonaws.com
Ingress Class:    <none>
Default backend:  <default>
Rules:
  Host                      Path  Backends
  ----                      ----  --------
  spark-core.comp.com  
                            /   spark-core:7077 (172.31.4.37:7077)
Annotations:                kubernetes.io/ingress.class: nginx

I use Nginx ingress controller:

    > kubectl get all --namespace=ingress-nginx
    NAME                                            READY   STATUS    RESTARTS   AGE
    pod/nginx-ingress-controller-5c6567c67d-68vk8   1/1     Running   0          40m
    pod/nginx-ingress-controller-5c6567c67d-885nc   1/1     Running   0          40m
    pod/nginx-ingress-controller-5c6567c67d-mhnxq   1/1     Running   0          40m
    
    NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP                                                                     PORT(S)                                     AGE
    service/ingress-nginx   LoadBalancer   10.100.87.199   a17976c980f984f6d998122f02f8109d-eb9d66d6481e7784.elb.us-west-2.amazonaws.com   7077:30807/TCP,80:30801/TCP,443:30195/TCP   177m
    
    NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/nginx-ingress-controller   3/3     3            3           40m
    
    NAME                                                  DESIRED   CURRENT   READY   AGE
    replicaset.apps/nginx-ingress-controller-5c6567c67d   3         3         3       40m

I see the following suspicious (Readiness probe failed: HTTP probe failed with statuscode: 500):

> kubectl describe nginx-ingress-controller-5c6567c67d-885nc -n ingress-nginx 
...
Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  54m   default-scheduler  Successfully assigned ingress-nginx/nginx-ingress-controller-5c6567c67d-885nc to ip-172-31-8-112.us-west-2.compute.internal
  Normal   Pulled     54m   kubelet            Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.30.0" already present on machine
  Normal   Created    54m   kubelet            Created container nginx-ingress-controller
  Normal   Started    54m   kubelet            Started container nginx-ingress-controller
  Warning  Unhealthy  54m   kubelet            Readiness probe failed: HTTP probe failed with statuscode: 500

> kubectl logs nginx-ingress-controller-5c6567c67d-885nc -n ingress-nginx
I1030 21:30:36.698026       7 main.go:237] Running in Kubernetes cluster version v1.21+ (v1.21.14-eks-6d3986b) - git (clean) commit 8877a3e28d597e1184c15e4b5d543d5dc36b083b - platform linux/amd64
I1030 21:30:36.914228       7 main.go:102] SSL fake certificate created /etc/ingress-controller/ssl/default-fake-certificate.pem
I1030 21:30:36.946709       7 nginx.go:263] Starting NGINX Ingress controller
I1030 21:30:36.970410       7 event.go:281] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"nginx-configuration", UID:"b01aa3e0-c134-4bc8-8c8d-3496d906c06f", APIVersion:"v1", ResourceVersion:"7120", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/nginx-configuration
I1030 21:30:36.971872       7 event.go:281] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"udp-services", UID:"e39e26ba-3e62-4f9f-8625-8926c10bc87c", APIVersion:"v1", ResourceVersion:"7138", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/udp-services
I1030 21:30:36.972031       7 event.go:281] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"tcp-services", UID:"667f0d5b-b5e5-4532-b832-24ef2f98793f", APIVersion:"v1", ResourceVersion:"7130", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/tcp-services
I1030 21:30:38.147200       7 nginx.go:307] Starting NGINX process
I1030 21:30:38.147316       7 leaderelection.go:242] attempting to acquire leader lease  ingress-nginx/ingress-controller-leader-nginx...
I1030 21:30:38.149071       7 controller.go:137] Configuration changes detected, backend reload required.
I1030 21:30:38.153011       7 status.go:86] new leader elected: nginx-ingress-controller-5c6567c67d-vf46c
I1030 21:30:38.248630       7 controller.go:153] Backend successfully reloaded.
I1030 21:30:38.248854       7 controller.go:162] Initial sync, sleeping for 1 second.
I1030 21:31:13.052326       7 leaderelection.go:252] successfully acquired lease ingress-nginx/ingress-controller-leader-nginx
I1030 21:31:13.052474       7 status.go:86] new leader elected: nginx-ingress-controller-5c6567c67d-885nc
I1030 21:51:12.252844       7 event.go:281] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"fargate-profile-selector", Name:"spark-core-ingress", UID:"874c0b49-dc15-44de-a09a-10d95d4699ea", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"49234", FieldPath:""}): type: 'Normal' reason: 'CREATE' Ingress fargate-profile-selector/spark-core-ingress
I1030 21:51:12.252909       7 controller.go:137] Configuration changes detected, backend reload required.
I1030 21:51:12.351617       7 controller.go:153] Backend successfully reloaded.
I1030 21:51:13.065232       7 status.go:274] updating Ingress fargate-profile-selector/spark-core-ingress status from [] to [{ a17976c980f984f6d998122f02f8109d-eb9d66d6481e7784.elb.us-west-2.amazonaws.com}]
I1030 21:51:13.079963       7 event.go:281] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"fargate-profile-selector", Name:"spark-core-ingress", UID:"874c0b49-dc15-44de-a09a-10d95d4699ea", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"49241", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress fargate-profile-selector/spark-core-ingress

I tried both:

curl -I a17976c980f984f6d998122f02f8109d-eb9d66d6481e7784.elb.us-west-2.amazonaws.com

and

curl -i -H "Host:spark-core.comp.com" a17976c980f984f6d998122f02f8109d-eb9d66d6481e7784.elb.us-west-2.amazonaws.com

Both of them are hanging without any errors.

Update: Added Host here:

curl -i -H "Host:spark-core.comp.com" a17976c980f984f6d998122f02f8109d-eb9d66d6481e7784.elb.us-west-2.amazonaws.com

Upvotes: 1

Views: 840

Answers (1)

Harsh Manvar
Harsh Manvar

Reputation: 30083

i think the issue is that you directly hitting the external IP of the Nginx ingress controller.

spark-core.comp.com is your Domain in the ingress rule and the Nginx ingress controller checks for that.

Also not sure why you have tried passing the Host name in Header as the ingress controller check the Host in the request not Header.

Generally, we map the domain spark-core.comp.com with CNAME to external IP, if you are on the local system you can add an entry to /etc/host file check the domain in the browser.

When you hit curl -I a17976c980f984f6d998122f02f8109d-eb9d66d6481e7784.elb.us-west-2.amazonaws.com this should return 404 from Nginx controller.

Upvotes: 1

Related Questions