Why am I losing my connection to my MongoDB after my GKE node gets preempted?

Question

I am running a Mongo, Express, Node, React app in a GKE cluster that is setup with a preemptible VM (to save money). I am using mongoose to connect to my MongoDB which is hosted on Mongo Cloud Atlas. Everything works find when the pod first starts. However, when my node gets preempted, I lose connection to my mongoDB instance. I then have to go in and manually scale the deployment down to 0 replicas and then scaled it back up and the connection to the mongoDB is restored. Below is the error I am getting and the code for my mongo connection. Is this just a intended effect of using a preemptible instance? Is there any way to deal with it like, automatically scale the deployment after a preemption? I was running a GKE autopilot cluster and had no problems but that was a little expensive for my purposes. Thanks

mongoose
    .connect(process.env.MONGODB_URL, {
        useNewUrlParser: true,
        useUnifiedTopology: true,
        useFindAndModify: false,
    })
    .then(() => console.log('mongoDB connected...'));

(node: 24) UnhandledPromiseRejectionWarning: Error: querySrv ECONNREFUSED _mongodb._tcp.clusterx.xxxxx.azure.mongodb.net at QueryReqWrap.onresolve (dns.js:203)

kombadzomba · Accepted Answer

The VM preemption can be reproduced in Compute Engine -> Instance groups -> Restart/Replace VMS and then choose option: Replace. After the VM has been restarted, the containers will be recreated too but unfortunately with network issues as mentioned.

My solution was to add liveness and readiness probes to Kubernetes Pods/Deployment via /health URL which checks if MongoDB is available and returns status code 500 if not. Details on how to define liveness and readiness probes in Kubernetes are here. The Kubernetes will restart pods that are not alive. The pods created later won't have network issues.

yaml spec block in my project looks like this:

spec:
  containers:
  - env:
    - name: MONGO_URL
      value: "$MONGO_URL"
    - name: NODE_ENV
      value: "$NODE_ENV"
    image: gcr.io/$GCP_PROJECT/$APP_NAME:$APP_VERSION
    imagePullPolicy: IfNotPresent
    name: my-container
    # the readiness probe details
    readinessProbe:
      httpGet: # make an HTTP request
        port: 3200 # port to use
        path: /health # endpoint to hit
        scheme: HTTP # or HTTPS
      initialDelaySeconds: 5 # how long to wait before checking
      periodSeconds: 5 # how long to wait between checks
      successThreshold: 1 # how many successes to hit before accepting
      failureThreshold: 1 # how many failures to accept before failing
      timeoutSeconds: 3 # how long to wait for a response
    # the livenessProbe probe details
    livenessProbe:
      httpGet: # make an HTTP request
        port: 3200 # port to use
        path: /health # endpoint to hit
        scheme: HTTP # or HTTPS
      initialDelaySeconds: 15 # how long to wait before checking
      periodSeconds: 5 # how long to wait between checks
      successThreshold: 1 # how many successes to hit before accepting
      failureThreshold: 2 # how many failures to accept before failing
      timeoutSeconds: 3 # how long to wait for a response

Why am I losing my connection to my MongoDB after my GKE node gets preempted?

Answers (1)

Related Questions