Reputation: 5750
Using Ansible v2.9.12
Question: I'd like Ansible to fail/stop the play when a task fails, when multiple hosts execute the task. In that sense, Ansible should abort the task from further execution. The configuration should work in a role, so using serial
, or using different plays, is not possible.
Example ;
- hosts:
- host1
- host2
- host3
any_errors_fatal: true
tasks:
- name: always fail
shell: /bin/false
throttle: 1
Provides ;
===== task | always fail =======
host1: fail
host2: fail
host3: fail
Meaning, the task is still executed on the second host and third host. I'd desire the whole play to fail/stop, once a task fails on a host. When the task fails on the last host, Ansible should abort as well.
Desired outcome ;
===== task | always fail =======
host1: fail
host2: not executed/skipped, cause host1 failed
host3: not executed/skipped, cause host1 failed
As you can see, I've fiddled around with error handling, but without prevail.
Background info: I've been developing an idempotent Ansible role for mysql. It is possible to setup a cluster with multiple hosts. The role also supports adding an arbiter.
The arbiter does not has the mysql application installed, but the host is still required in the play.
Now, imagine three hosts. Host1 is the arbiter, host2 and host3 have mysql installed, setup in a cluster. The applications are setup by the Ansible role. Now, Ansible executes the role for a second/third/fourth/whatever time, and changes a config setting of mysql. Mysql needs a rolling restart. Usually, one writes some thing along the lines of:
- template:
src: mysql.j2
dest: /etc/mysql
register: mysql_config
when: mysql.role != 'arbiter'
- service:
name: mysql
state: restarted
throttle: 1
when:
- mysql_config.changed
- mysql.role != 'arbiter'
The downside of this Ansible configuration, is that if mysql fails to start on host2 due to whatever reason, Ansible will also restart mysql on host3. And that is undesired, because if mysql fails on host3 as well, then the cluster is lost. So, for this specific task I'd like Ansible to stop/abort/skip other tasks if mysql has failed to start on a single host in the play.
Upvotes: 4
Views: 1655
Reputation: 5750
Ok, this works:
# note that test-multi-01 set host_which_is_skipped: true
---
- hosts:
- test-multi-01
- test-multi-02
- test-multi-03
tasks:
- set_fact:
host_which_is_skipped: "{{ inventory_hostname }}"
when: host_which_is_skipped
- shell: /bin/false
run_once: yes
delegate_to: "{{ item }}"
loop: "{{ ansible_play_hosts }}"
when:
- item != host_which_is_skipped
- result is undefined or result is not failed
register: result
- meta: end_play
when: result is failed
- debug:
msg: Will not happen
When the shell command is set to /bin/true
, the command is executed on host2 and host3.
Upvotes: 1
Reputation: 39244
Disclaimer: most of the credit of the fully working part of this answer is going to @Tomasz Klosinski's answer on Server Fault.
Here is for a partially working idea, that only falls short of one host.
For the demo, I purposely increased my hosts number to 5 hosts.
The idea is based on the special variables ansible_play_batch
and ansible_play_hosts_all
that are described in the above mentioned document page as:
ansible_play_hosts_all
List of all the hosts that were targeted by the play
ansible_play_batch List of active hosts in the current play run limited by the serial, aka ‘batch’. Failed/Unreachable hosts are not considered ‘active’.
The idea, coupled with your trial at using throttle: 1
should work, but fail short of one host, executing on host2
when it should skip it.
Given the playbook:
- hosts: all
gather_facts: no
tasks:
- shell: /bin/false
when: "ansible_play_batch | length == ansible_play_hosts_all | length"
throttle: 1
This yields the recap:
PLAY [all] ***********************************************************************************************************
TASK [shell] *********************************************************************************************************
fatal: [host1]: FAILED! => {"changed": true, "cmd": "/bin/false", "delta": "0:00:00.003915", "end": "2020-09-06 22:09:16.550406", "msg": "non-zero return code", "rc": 1, "start": "2020-09-06 22:09:16.546491", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [host2]: FAILED! => {"changed": true, "cmd": "/bin/false", "delta": "0:00:00.004736", "end": "2020-09-06 22:09:16.844296", "msg": "non-zero return code", "rc": 1, "start": "2020-09-06 22:09:16.839560", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
skipping: [host3]
skipping: [host4]
skipping: [host5]
PLAY RECAP ***********************************************************************************************************
host1 : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
host2 : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
host3 : ok=0 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
host4 : ok=0 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
host5 : ok=0 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
Looking further on that, I landed on this answer of Server Fault, and this looks to be the right idea to craft your solution.
Instead of going the normal way, the idea is to delegate everything from the first host with a loop on all targeted hosts of the play, because, in a loop, you are then able to access the registered fact of the previous host in an easy manner, as long as your register
it.
So here is the playbook:
- hosts: all
gather_facts: no
tasks:
- shell: /bin/false
loop: "{{ ansible_play_hosts }}"
register: failing_task
when: "failing_task | default({}) is not failed"
delegate_to: "{{ item }}"
run_once: true
This would yield the recap:
PLAY [all] ***********************************************************************************************************
TASK [shell] *********************************************************************************************************
failed: [host1 -> host1] (item=host1) => {"ansible_loop_var": "item", "changed": true, "cmd": "/bin/false", "delta": "0:00:00.003706", "end": "2020-09-06 22:18:23.822608", "item": "host1", "msg": "non-zero return code", "rc": 1, "start": "2020-09-06 22:18:23.818902", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
skipping: [host1] => (item=host2)
skipping: [host1] => (item=host3)
skipping: [host1] => (item=host4)
skipping: [host1] => (item=host5)
NO MORE HOSTS LEFT ***************************************************************************************************
PLAY RECAP ***********************************************************************************************************
host1 : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
And just for the sake of proving it works as intended, altering it to make the host2
fail specifically, with the help of failed_when
:
- hosts: all
gather_facts: no
tasks:
- shell: /bin/false
loop: "{{ ansible_play_hosts }}"
register: failing_task
when: "failing_task | default({}) is not failed"
delegate_to: "{{ item }}"
run_once: true
failed_when: "item == 'host2'"
Yields the recap:
PLAY [all] ***********************************************************************************************************
TASK [shell] *********************************************************************************************************
changed: [host1 -> host1] => (item=host1)
failed: [host1 -> host2] (item=host2) => {"ansible_loop_var": "item", "changed": true, "cmd": "/bin/false", "delta": "0:00:00.004226", "end": "2020-09-06 22:20:38.038546", "failed_when_result": true, "item": "host2", "msg": "non-zero return code", "rc": 1, "start": "2020-09-06 22:20:38.034320", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
skipping: [host1] => (item=host3)
skipping: [host1] => (item=host4)
skipping: [host1] => (item=host5)
NO MORE HOSTS LEFT ***************************************************************************************************
PLAY RECAP ***********************************************************************************************************
host1 : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
Upvotes: 0
Reputation: 4485
One way to solve this would be to run the playbook with serial: 1
. That way, the tasks are executed serially on the hosts and as soon as one task fails, the playbook terminates:
- name: My playbook
hosts: all
serial: 1
any_errors_fatal: true
tasks:
- name: Always fail
shell: /bin/false
In this case, this results in the task only being executed on the first host. Note that there is also the order
clause, with which you can also control the order in which hosts are run: https://docs.ansible.com/ansible/latest/user_guide/playbooks_intro.html#hosts-and-users
Upvotes: 0