nomad fails release CSI volume during "restart -reschedule" which would move allocations to new host

Question

Context:

Have nomad job with configured tasks dependent on CSI (AWS EBS) volumes across three host machines.
The allocations start and the service works. The volumes work and data is stored there.
nomad stop|start|restart all work. These commands (usually) restart the allocation on the same host machine.

Problem:

When nomad restart -reschedule is run and there is a new, available, nomad host machine, nomad fails to release the CSI mount after an individual allocation has stopped.

From what I can tell, nomad doesn't even try to release the volume. There's no "failed to release" message in any log file (nomad server, nomad client, ebs controller, ebs node).

The first error I see anywhere is this:

[ERROR] nomad.fsm: CSIVolumeClaim failed: error="volume max claims reached"

which occurs on the new node as it attempts to mount the volume.

At this point the previous allocation is dead/stopped, but the volume still mounted on the previous host. And the volume is marked as unavailable.

nomad fails release CSI volume during "restart -reschedule" which would move allocations to new host

Answers (1)

Related Questions

nomad fails release CSI volume during &quot;restart -reschedule&quot; which would move allocations to new host

Answers (1)

Related Questions

nomad fails release CSI volume during "restart -reschedule" which would move allocations to new host