stev-e
stev-e

Reputation: 456

ansible: Retry a task based on content of task's result.stderr

I am looking for a solution to the following situation: I automate machine provisioning in my homelab with the help of ansible. Given a poor network connection I want to make my playbooks a bit more robust against (short) network outages.

Now imagine a task like this:

- name: Install base dependencies for ... xyz
  become: true
  ansible.builtin.apt:
    update_cache: true
    pkg:
      - package 1
      - package ...
      - package n
  register: _result

That task occasionally fails. Now, in that case, I want to check the _result object's stderr for common known errors regarding network issues and retry the task only if it fails because of network issues. Like so:

- name: Install base dependencies for ... xyz
  become: true
  ansible.builtin.apt:
    update_cache: true
    pkg:
      - package 1
      - package ...
      - package n
  register: _result
  until: 'not "Connection timed out" in _result.stderr' # <-- timeout check
  retries: 3
  delay: 60

But this will of cause raise an exception if the task completes without issues and stderr is not present.

Checking stderr to be present before accessing it might be an idea but I could not figure out how to do so in a untilcondition.

Do you have any idea?

Upvotes: 1

Views: 195

Answers (2)

Vladimir Botka
Vladimir Botka

Reputation: 68004

Q: "This will raise an exception if the task completes without issues and stderr is not present."

A: Use the filter default

  until: "not 'Connection timed out' in _result.stderr|default('')"

Upvotes: 2

Kevin C
Kevin C

Reputation: 5720

If you are a 100% sure the task will succeed without a network connection, you might use a 'fire & forget' method.

  apt:
    update_cache: true
    pkg:
      - package 1
      - package ...
      - package n
  poll: 0

When you set poll: 0, Ansible starts the task and immediately moves on to the next task without waiting for a result. Each async task runs until it either completes, fails or times out (runs longer than its async value). The playbook run ends without checking back on async tasks.

Note that is quite difficult and perhaps a bad practice to code against network failures.
The best solution would actually be to solve the network issues.

Upvotes: 0

Related Questions