k8s pods not able to retry when attach volume timeout

Question

Sometimes I got a bunch of jobs to launch, and each of them mounts a pvc. As our resource is limited, some pods fail to mount in less than one minute.

Unable to mount volumes for pod "package-job-120348968617328640-5gv7s_vname(b059856a-ecfa-11ea-a226-fa163e205547)": timeout expired waiting for volumes to attach or mount for pod "vname"/"package-job-120348968617328640-5gv7s". list of unmounted volumes=[tmp]. list of unattached volumes=[log tmp].

And it sure keeps retrying. But it never success (event age is like 44s (x11 over 23m)). But if I delete this pod, this job will create a new pod and it will complete.

So why is this happening? Shouldn't pod retry mount automatically instead of needing manual intervention? And if this is not avoidable, is there a workaround that it will automatically delete pods in Init Phase more than 2 min?

Conclusion

It's actually the attaching script provided by my cloud provider in some of the nodes stucks (caused by a network problem). So If others run into these problem, maybe checking storage plugin that attaches disks is a good idea.

Jonas · Accepted Answer

So why is this happening? Shouldn't pod retry mount automatically instead of needing manual intervention? And if this is not avoidable, is there a workaround that it will automatically delete pods in Init Phase more than 2 min?

There can be multiple reasons to this. Do you have any Events on the Pod if you do kubectl describe pod ? And do you reuse the PVC that another Pod used before?

I guess that you use a regional cluster, consisting of multiple datacenters (Availability Zones) and that your PVC is located in one AZ but your Pod is scheduled to run in a different AZ? In such situation, the Pod will never be able to mount the volume since it is located in another AZ.

k8s pods not able to retry when attach volume timeout

Conclusion

Answers (2)

Related Questions