lopek
lopek

Reputation: 522

Recreate Pod managed by a StatefulSet with a fresh PersistentVolume

On an occasional basis I need to perform a rolling replace of all Pods in my StatefulSet such that all PVs are also recreated from scratch. The reason to do so is to get rid of all underlying hard drives that use old versions of encryption key. This operation should not be confused with regular rolling upgrades, for which I still want volumes to survive Pod terminations. The best routine I figured so far to do that is following:

  1. Delete the PV.
  2. Delete the PVC.
  3. Delete the Pod.
  4. Wait until all deletions complete.
  5. Manually recreate the PVC deleted in step 2.
  6. Wait for the new Pod to finish streaming data from other Pods in the StatefulSet.
  7. Repeat from step 1. for the next Pod.

I'm not happy about step 5. I wish StatefulSet recreated the PVC for me, but unfortunately it does not. I have to do it myself, otherwise Pod creation fails with following error:

Warning  FailedScheduling   3s (x15 over 15m)  default-scheduler   persistentvolumeclaim "foo-bar-0" not found

Is there a better way to do that?

Upvotes: 6

Views: 9158

Answers (4)

Kristofer
Kristofer

Reputation: 8624

I tried your steps above but actually deleting the pvc failed or it got auto-recreated before I noticed. Anyways, maybe on modern kubernetes (I'm using GKE autopilot 1.25.9) the following steps are enough:

  1. Delete the pv
  2. Delete the pod

At least for me this caused the underlying disk to be replaced with no need to recreate pvc manually.

p.s. I needed to do this to recover an elasticsearch node that had a full disk and refused to start (fatal exception while booting Elasticsearch java.io.UncheckedIOException: Failed to load persistent cache. I didn't want to expand the disk size like many guides describe because the problem was data cleanup in the cluster was broken so more space wasn't actually needed.

Upvotes: 0

Ben Langfeld
Ben Langfeld

Reputation: 253

This is described in https://github.com/kubernetes/kubernetes/issues/89910. The workaround proposed there, of deleting the new Pod which is stuck pending, works and the second time it gets replaced a new PVC is created. It was marked as a duplicate of https://github.com/kubernetes/kubernetes/issues/74374, and reported as potentially fixed in 1.20.

Upvotes: 3

jpdstan
jpdstan

Reputation: 61

I just recently had to do this. The following worked for me:

# Delete the PVC
$ kubectl delete pvc <pvc_name>

# Delete the underlying statefulset WITHOUT deleting the pods
$ kubectl delete statefulset <statefulset_name> --cascade=false 

# Delete the pod with the PVC you don't want
$ kubectl delete pod <pod_name>

# Apply the statefulset manifest to re-create the StatefulSet, 
# which will also recreate the deleted pod with a new PVC
$ kubectl apply -f <statefulset_yaml>

Upvotes: 6

Anton Matsiuk
Anton Matsiuk

Reputation: 690

It seems like you're using "Persistent" volume in a wrong way. It's designed to keep the data between roll-outs, not to delete it. There are other different ways to renew the keys. One can use k8s Secret and ConfigMap to mount the key into the Pod. Then you just need to recreate a Secret during a rolling update

Upvotes: -2

Related Questions