Reputation: 15
I am running a Mongo, Express, Node, React app in a GKE cluster that is setup with a preemptible VM (to save money). I am using mongoose to connect to my MongoDB which is hosted on Mongo Cloud Atlas. Everything works find when the pod first starts. However, when my node gets preempted, I lose connection to my mongoDB instance. I then have to go in and manually scale the deployment down to 0 replicas and then scaled it back up and the connection to the mongoDB is restored. Below is the error I am getting and the code for my mongo connection. Is this just a intended effect of using a preemptible instance? Is there any way to deal with it like, automatically scale the deployment after a preemption? I was running a GKE autopilot cluster and had no problems but that was a little expensive for my purposes. Thanks
mongoose
.connect(process.env.MONGODB_URL, {
useNewUrlParser: true,
useUnifiedTopology: true,
useFindAndModify: false,
})
.then(() => console.log('mongoDB connected...'));
(node: 24) UnhandledPromiseRejectionWarning: Error: querySrv ECONNREFUSED _mongodb._tcp.clusterx.xxxxx.azure.mongodb.net at QueryReqWrap.onresolve (dns.js:203)
Upvotes: 0
Views: 434
Reputation: 59
The VM preemption can be reproduced in Compute Engine -> Instance groups -> Restart/Replace VMS and then choose option: Replace. After the VM has been restarted, the containers will be recreated too but unfortunately with network issues as mentioned.
My solution was to add liveness and readiness probes to Kubernetes Pods/Deployment via /health
URL which checks if MongoDB is available and returns status code 500 if not. Details on how to define liveness and readiness probes in Kubernetes are here. The Kubernetes will restart pods that are not alive. The pods created later won't have network issues.
yaml spec block in my project looks like this:
spec:
containers:
- env:
- name: MONGO_URL
value: "$MONGO_URL"
- name: NODE_ENV
value: "$NODE_ENV"
image: gcr.io/$GCP_PROJECT/$APP_NAME:$APP_VERSION
imagePullPolicy: IfNotPresent
name: my-container
# the readiness probe details
readinessProbe:
httpGet: # make an HTTP request
port: 3200 # port to use
path: /health # endpoint to hit
scheme: HTTP # or HTTPS
initialDelaySeconds: 5 # how long to wait before checking
periodSeconds: 5 # how long to wait between checks
successThreshold: 1 # how many successes to hit before accepting
failureThreshold: 1 # how many failures to accept before failing
timeoutSeconds: 3 # how long to wait for a response
# the livenessProbe probe details
livenessProbe:
httpGet: # make an HTTP request
port: 3200 # port to use
path: /health # endpoint to hit
scheme: HTTP # or HTTPS
initialDelaySeconds: 15 # how long to wait before checking
periodSeconds: 5 # how long to wait between checks
successThreshold: 1 # how many successes to hit before accepting
failureThreshold: 2 # how many failures to accept before failing
timeoutSeconds: 3 # how long to wait for a response
Upvotes: 2