Phoenix
Phoenix

Reputation: 781

Ansible stop the whole playbook if all hosts in a single play fail

I'm struggling to understand what's the intended behavior of ansible in case all hosts fail in a single play but there are other plays on other hosts in the playbook.

For example consider the following playbook:

---
- name: P1
  hosts: a,b
  tasks:
    - name: Assert 1
      ansible.builtin.assert:
        that: 1==2
      when: inventory_hostname != "c"

- name: P2
  hosts: y,z
  tasks:
    - name: Debug 2
      ansible.builtin.debug:
        msg: 'YZ'

All 4 hosts a,b,y,z point to localhost for the sake of clarity.

What happens is assert fails and the whole playbook stops. However it seems to contradict the documentation which says that in case of an error ansible stops executing on the failed host but continues on the other hosts, see Error handling

In case I change the condition to when: inventory_hostname != 'b' and therefore b does not fail then the playbook continues to execute the second play on hosts y,z.

To me the initial failure does not seem reasonable because the hosts y,z have not experience any errors and therefore execution on them should not be prevented by the error on the other hosts.

Is this is a bug or am I missing something?

Upvotes: 4

Views: 4355

Answers (2)

Vladimir Botka
Vladimir Botka

Reputation: 68189

It's not a bug. It's by design (see Notes 3,4 below). As discussed in the comments to the other answer, the decision whether to terminate the whole playbook when all hosts in a play fail or not seems to be a trade-off. Either a user will have to handle how to proceed to the next play if necessary or how to stop the whole playbook if necessary. You can see in the examples below that both options require handling errors in a block approximately to the same extent.

  • The first case was implemented by Ansible: A playbook will terminate when all hosts in a play fail. For example,
- hosts: host01,host02
  tasks:
    - assert:
        that: false
- hosts: host03
  tasks:
    - debug:
        msg: Hello
PLAY [host01,host02] *************************************************************************

TASK [assert] ********************************************************************************
fatal: [host01]: FAILED! => changed=false 
  assertion: false
  evaluated_to: false
  msg: Assertion failed
fatal: [host02]: FAILED! => changed=false 
  assertion: false
  evaluated_to: false
  msg: Assertion failed

PLAY RECAP ***********************************************************************************
host01                     : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
host02                     : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
  • The playbook will proceed to the next play when not all hosts in a play fail. For example,
- hosts: host01,host02
  tasks:
    - assert:
        that: false
      when: inventory_hostname == 'host01'
- hosts: host03
  tasks:
    - debug:
        msg: Hello
PLAY [host01,host02] *************************************************************************

TASK [assert] ********************************************************************************
fatal: [host01]: FAILED! => changed=false 
  assertion: false
  evaluated_to: false
  msg: Assertion failed
skipping: [host02]

PLAY [host03] ********************************************************************************

TASK [debug] *********************************************************************************
ok: [host03] => 
  msg: Hello

PLAY RECAP ***********************************************************************************
host01                     : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
host02                     : ok=0    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
host03                     : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
  • To proceed to the next play when all hosts in a play fail, a user has to clear the errors, and, optionally, end the host in a play as well. For example,
- hosts: host01,host02
  tasks:
    - block:
        - assert:
            that: false
      rescue:
        - meta: clear_host_errors
        - meta: end_host
- hosts: host03
  tasks:
    - debug:
        msg: Hello
PLAY [host01,host02] *************************************************************************

TASK [assert] ********************************************************************************
fatal: [host01]: FAILED! => changed=false 
  assertion: false
  evaluated_to: false
  msg: Assertion failed
fatal: [host02]: FAILED! => changed=false 
  assertion: false
  evaluated_to: false
  msg: Assertion failed

TASK [meta] **********************************************************************************

TASK [meta] **********************************************************************************

TASK [meta] **********************************************************************************

PLAY [host03] ********************************************************************************

TASK [debug] *********************************************************************************
ok: [host03] => 
  msg: Hello

PLAY RECAP ***********************************************************************************
host01                     : ok=0    changed=0    unreachable=0    failed=0    skipped=0    rescued=1    ignored=0   
host02                     : ok=0    changed=0    unreachable=0    failed=0    skipped=0    rescued=1    ignored=0   
host03                     : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
  • Update: The playbook can't be stopped by meta end_play after this was 'fixed' in 2.12.2.
It was possible to end the whole playbook by meta *end_play* in Ansible 2.12.1. Imagine that failed all hosts in a play wouldn't terminate the whole playbook. In other words, imagine it's not implemented that way. Then, a user might want to terminate the playbook on her own. For example,
- hosts: host01,host02
  tasks:
    - block:
        - assert:
            that: false
      rescue:
        - meta: clear_host_errors
        - set_fact:
            host_failed: true
    - meta: end_play
      when: ansible_play_hosts_all|map('extract', hostvars, 'host_failed') is all
      run_once: true
- hosts: host03
  tasks:
    - debug:
        msg: Hello
PLAY [host01,host02] *************************************************************************

TASK [assert] ********************************************************************************
fatal: [host01]: FAILED! => changed=false 
  assertion: false
  evaluated_to: false
  msg: Assertion failed
fatal: [host02]: FAILED! => changed=false 
  assertion: false
  evaluated_to: false
  msg: Assertion failed

TASK [meta] **********************************************************************************

TASK [set_fact] ******************************************************************************
ok: [host01]
ok: [host02]

TASK [meta] **********************************************************************************

PLAY RECAP ***********************************************************************************
host01                     : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=1    ignored=0   
host02                     : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=1    ignored=0

Notes

  1. meta end_host means: 'end the play for this host'
- hosts: host01
  tasks:
    - meta: end_host
- hosts: host01,host02
  tasks:
    - debug:
        msg: Hello
PLAY [host01] ********************************************************************************

TASK [meta] **********************************************************************************

PLAY [host01,host02] *************************************************************************

TASK [debug] *********************************************************************************
ok: [host01] => 
  msg: Hello
ok: [host02] => 
  msg: Hello

PLAY RECAP ***********************************************************************************
host01                     : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
host02                     : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
  1. meta end_play means: 'end the playbook' (this was 'fixed' in 2.12.2. See #76672)
- hosts: host01
  tasks:
    - meta: end_play
- hosts: host01,host02
  tasks:
    - debug:
        msg: Hello
PLAY [host01] ********************************************************************************

TASK [meta] **********************************************************************************

PLAY RECAP ***********************************************************************************

  1. Quoting from #37309

If all hosts in the current play batch (fail) the play ends, this is 'as designed' behavior ... 'play batch' is 'serial size' or all hosts in play if serial is not set.

  1. Quoting from the source
# check the number of failures here, to see if they're above the maximum
# failure percentage allowed, or if any errors are fatal. If either of those
# conditions are met, we break out, otherwise, we only break out if the entire
# batch failed
failed_hosts_count = len(self._tqm._failed_hosts) + len(self._tqm._unreachable_hosts) - \
    (previously_failed + previously_unreachable)

if len(batch) == failed_hosts_count:
    break_play = True
    break

Upvotes: 4

β.εηοιτ.βε
β.εηοιτ.βε

Reputation: 39294

A playbook with multiple play is just sequential, it cannot know in front that you are going to have any other hosts in a later play.

Because your assert task, in the first play, has exhausted all hosts of the play, it makes sense that the playbook stops there, as it won't have anything to do on any further tasks in P1, and remember, it doesn't know anything about P2 yet, so it just end there.

Upvotes: 1

Related Questions