matthieu.cham
matthieu.cham

Reputation: 501

Openshift Online issue: pod with persistent volume failed scheduling

I have a small webapp which use to run fine on Openshift Online for 9 months, which consist in a python service and a postgresql database (with, of course, a persistent volume)

All of a sudden, last tuesday, the postgresql pod stopped working, so I tried to redeploy the service. And it's been almost 2 days now that the pod scheduling constantly fail. I have the following entry in the events log:

Failed Scheduling 0/110 nodes are available: 1 node(s) had disk pressure, 5 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector, 98 node(s) exceed max volume count. 37 times in the last 13 minutes

So, it looks like a "disk full" issue at RH's datacenters, which should be easy to fix but I don't see any notification of it on the status page (https://status.starter.openshift.com/)

My problem looks a lot like the one described for start-us-west-1:

Investigating - Currently Openshift SRE team trying to resolve this incident. There are high chances that you will face difficulties having pods with attached volumes scheduled. We're sorry for the inconvenience.

Yet I'm on starter-ca-central-1, which should not be affected. Since it's been such a long time, I'm wondering if anyone at RH is aware of the issue ? But I cannot find a way to contact them for users with a starter plan

Anybody face the same issue on ca-central-1 ?

Upvotes: 2

Views: 1038

Answers (2)

Cleverson Sacramento
Cleverson Sacramento

Reputation: 363

After at least 4 months of normal working my app running on Starter US West 1 suddenly started to get the following error message during the deployment:

0/106 nodes are available: 1 node(s) had disk pressure, 29 node(s) exceed max volume count, 3 node(s) were unschedulable, 4 node(s) had taints that the pod didn't tolerate, 6 node(s) didn't match node selector, 63 Insufficient cpu.

Nothing has changed on settings until the fail started. I've realized that problem just occur on deployments with persistent volume, like PostgreSQL Persistent in my case.

I submitted this issue over the above mentioned url right now. When I got some response or some solution I'll post here.

Upvotes: 0

matthieu.cham
matthieu.cham

Reputation: 501

As mentioned by Graham in the comment, https://help.openshift.com/forms/community-contact.html is the way to go

A few hours (12, actually) after posting my issue to this link, I got a feedback from someone at RH who said that my request was taken into account.

This morning, my app is up at last, and the trouble notice in on the status page:

Investigating - Currently Openshift SRE team trying to resolve this incident. There are high chances that you will face difficulties having pods with attached volumes scheduled. We're sorry for the inconvenience.

Not sure of what would have happened if I hadn't contacted them...

Upvotes: 2

Related Questions