502 bad gateway errors when using ALB and aws-load-balancer-controller

Question

We have a service on our EKS cluster, this service is an API that receives many thousands of requests per day. Occasionally, we have noticed when making a request we get a 502 error. If i were to guess out of 100 requests maybe 10-to-20 would be 502 errors.

We are using aws load balancer controller - https://github.com/kubernetes-sigs/aws-load-balancer-controller

example response

    status: 502,
    statusText: 'Bad Gateway',
    headers: {
      server: 'awselb/2.0',
      date: 'Wed, 06 Oct 2021 10:24:19 GMT',
      'content-type': 'text/html',
      'content-length': '122',
      connection: 'close'
    },

Troubleshooting

The service is not crashing and does not receive the requests being made ( which return 502, we can identify this using correlation-ids sent from the client to the service).
When port-forwarding to bypass the alb and make a direct connection requests to the service we do not experience this problem.

From the above we have determined that these 502's are not from our application/service.

Upon further research we have noticed others experiencing a similar issue to ours.

Environment

AWS Load Balancer controller version: v2.1.3
Kubernetes version: 1.19
Using EKS (yes/no), if so version?: Yes/v1.19.13-eks-8df270

Please see configuration details below:

Service Deployment Config

kubectl get service --selector=app=entity-extractor-api-staging -n staging
NAME                           TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE
entity-extractor-api-staging   NodePort   172.20.95.5           80:31037/TCP   18h

kubectl get deployment --selector=app=entity-extractor-api-staging -n staging
NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
entity-extractor-api-staging   1/1     1            1           18h

apiVersion: v1
kind: Service
metadata:
  name: entity-extractor-api-staging
  labels:
    app: entity-extractor-api-staging
  namespace: staging
spec:
  type: NodePort
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: entity-extractor-api-staging
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: entity-extractor-api-staging
  labels:
    app: entity-extractor-api-staging
  namespace: staging
spec:
  replicas: 1
  selector:
    matchLabels:
      app: entity-extractor-api-staging
  template:
    metadata:
      labels:
        app: entity-extractor-api-staging
        log-label: 'true'
    spec:
      containers:
      - name: entity-extractor-api-staging
        image: :$TAG
        imagePullPolicy: Always
        env: 
        ports:
        - containerPort: 80
        resources: {}
      nodeSelector:
        acme/node-type: worker

Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: staging-ingress
  namespace: staging
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/group.name: ""
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80,"HTTPS": 443}]'
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-2::certificate/0250a551-8971-468d-a483-cad28f890463,arn:aws:acm:us-east-2::certificate/b32e9708-7aeb-495b-87b1-8532a2592eeb
    alb.ingress.kubernetes.io/tags: Environment=prod,Team=dev
    alb.ingress.kubernetes.io/healthcheck-path: /health
    alb.ingress.kubernetes.io/healthcheck-interval-seconds: '300'
    # alb.ingress.kubernetes.io/load-balancer-attributes: access_logs.s3.enabled=true,access_logs.s3.bucket=dev-ingress-logs-acme,access_logs.s3.prefix=dev-ingress
spec:
  rules:
    ....
    - host: entity-extractor.staging.
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: entity-extractor-api-staging
                port:
                  number: 80

example alb log

type | time | elb | client_ip | client_port | target_ip | target_port | request_processing_time | target_processing_time | response_processing_time | elb_status_code | target_status_code | received_bytes | sent_bytes | request_verb | request_url | request_proto | user_agent | ssl_cipher | ssl_protocol | target_group_arn | trace_id | domain_name | chosen_cert_arn | matched_rule_priority | request_creation_time | actions_executed | redirect_url | lambda_error_reason | target_port_list | target_status_code_list | classification | classification_reason
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
https | 2021-10-06T14:36:19.995743Z | app/k8s-acme-78db7a121a/27d8ce64549c8574 | 148.252.239.114 | 52152 | 10.0.2.240 | 31037 | 0 | 0.001 | -1 | 502 | - | 481 | 272 | POST | https://entity-extractor.staging..com | arn:aws:acm:us-east-2::certificate/b32e9708-7aeb-495b-87b1-8532a2592eeb | 17 | 2021-10-06T14:36:19.901000Z | forward | - | - | 10.0.2.240:31037 | - | - | -

If there is any other information you need , please let me know.

502 bad gateway errors when using ALB and aws-load-balancer-controller

Answers (1)

Related Questions