ABCD1133
ABCD1133

Reputation: 1

Zero Downtime Updates on kubernetes. When there is a request of uploading files

My goal is to do a zero downtime updates on kubernetes.

But, there is a problem related to a file upload.

the situation is when user uploads a file, webserver stores it first. and WAS saves metadatas of a file to DB.

so the problem is when we updates webservers. webserver are not gonna wait for the request to be finished. and the file uploads/downloads services are gonna be failed(if clients are conneted to a webserver which is gonna shut down).

what am i supposed to do about this?

Upvotes: 0

Views: 384

Answers (1)

laimison
laimison

Reputation: 1710

In short, there is no magic tool in Kubernetes that can solve it for any type of application.

What is the main goal?

  • Delete a pod (if you can delete a pod gracefully that is big step)

  • App supports both versions at the same time (for roll out and roll back)

So how to achieve zero downtime deployments and updates?

Kubernetes/Docker:

  • Application is running as special PID 1 so it can receive SIGTERM (standard graceful shutdown) signal directly

  • You specify terminationGracePeriodSeconds in StatefulSet or Deployment. When you scale down an application (or delete a pod to replace with new pod), no new connections are routed, it sends SIGTERM to app and waits for terminationGracePeriodSeconds time. Usually it's up to 5-10 minutes to drain connections, but could be even hours to finish long ones. If app understands SIGTERM as you wanted it can finish this earlier.

  • Just working readinessProbe check

Application:

  • Ideally understands SIGTERM and closes operations gracefully

  • App should be able to retry connection or operation if something failed in first attempt (e.g. API call from frontend to backend, DB query from backend, etc.) - this helps to retry operation on new pod and in general retry is a good thing in a highly available systems

  • App does its job in smaller chunks and with mentioned retry needs, for example using http://resumablejs.com to avoid long file transfers - long connections

  • Strategy for schema changes, both versions should be supported at the same time (so for instance if you add new column to DB, it's better to add it in first release, then use it in second release and so many other techniques)

Application (last resort):

  • Some companies which cannot afford downtime, but it's to complicated/not possible to deploy a new version decide to queue new connections (additional code) while app is upgrading. So files and DB records are imported from old into new version to finish zero downtime deployment.

Upvotes: 1

Related Questions