Slow Ansible performance when using loop with large YAML var

Question

Hello Developer Community!

I have been working on developing some Ansible playbooks to manage Citrix NetScaler configuration and would like to get some help about the following. I have the following data structure defined in a variable named nsapp_lb_server:

nsapp_lb_server:
    - name:                      "SRV-1"
      ipaddress:                 "10.102.102.1"
      comment:                   "Chewbacca"

    - name:                      "SRV-2"
      ipaddress:                 "10.102.102.2"
      comment:                   "C-3PO"

    - name:                      "SRV-3"
      ipaddress:                 "10.102.102.3"
      comment:                   "Obi-Wan Kenobi"
...

[+ another 1200 item...]

and I have the follow task:

  - name: "Check variables (loop)"
    ansible.builtin.assert:
        that:
            - ( (item.name is defined) and (item.name | length > 0) )
            - ( (item.ipaddress is defined) and (item.ipaddress | ipaddr() == item.ipaddress) )
            - ( (item.comment | length > 0) if (item.comment is defined) else omit )
    loop: "{{ nsapp_lb_server }}"

My problem is that, when I have thousands of records in the nsapp_lb_server variable, the loop is incredibly slow. The task finishes in 30 minutes, which is a very long time... :-(

After some digging on the Internet, it seems, the issue is caused by Ansible "loop" function, so I would like to check if there are any other methods what I can use instead of loop.

Are there any alternatives of Ansible "loop" which can provide the same result (looping over the entries of the variable)? I was thinking about using json_query, but still do not know how to implement it in this specific case.

My environment:

$ ansible --version
ansible [core 2.12.6]
  config file = /home/ansible/.ansible.cfg
  configured module search path = ['/home/ansible/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/ansible/ansible/lib/python3.9/site-packages/ansible
  ansible collection location = /home/ansible/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/ansible/ansible/bin/ansible
  python version = 3.9.7 (default, Sep 21 2021, 00:13:39) [GCC 8.5.0 20210514 (Red Hat 8.5.0-3)]
  jinja version = 3.0.3
  libyaml = True

Could anyone please point me to the right direction? I have been working on my code set for a very long time and after testing the code with large data, the code seems to be useless, because of the running time. I have also checked the hardware resource allocated to the VM where Ansible controller is running on, nothing problematic.

Many thanks in advance!

flowerysong · Accepted Answer

Running this validation as thousands of individual tasks is very slow because it adds a lot of execution and callback overhead. You can instead do it in a single task, with the caveat that it will be harder to track down the invalid list item(s):

- hosts: localhost
  gather_facts: false
  vars:
    nsapp_lb_server: "{{ nsapp_lb_samples * 10000 }}"
    nsapp_lb_samples:
        - name:                      "SRV-1"
          ipaddress:                 "10.102.102.1"
          comment:                   "Chewbacca"
        - name:                      "SRV-2"
          ipaddress:                 "10.102.102.2"
          comment:                   "C-3PO"
        - name:                      "SRV-3"
          ipaddress:                 "10.102.102.3"
          comment:                   "Obi-Wan Kenobi"
  tasks:
    - assert:
        that:
          - nsapp_lb_server | rejectattr('name') | length == 0
          - (nsapp_lb_server | map(attribute='ipaddress') | map('ipaddr')) == (nsapp_lb_server | map(attribute='ipaddress'))
          - nsapp_lb_server | selectattr('comment', 'defined') | rejectattr('comment') | length == 0

This runs in ~5 seconds for the 30,000 test entries I fed it.

To make it easier to find the bad values without making the task extremely ugly, you can split it up into a series of tasks:

- hosts: localhost
  gather_facts: false
  vars:
    nsapp_lb_server: "{{ nsapp_lb_samples * 10000 }}"
    nsapp_lb_samples:
        - name:                      "SRV-1"
          ipaddress:                 "10.102.102.1"
          comment:                   "Chewbacca"
        - name:                      "SRV-2"
          ipaddress:                 "10.102.102.2"
          comment:                   "C-3PO"
        - name:                      "SRV-3"
          ipaddress:                 "10.102.102.3"
          comment:                   "Obi-Wan Kenobi"
  tasks:
    - name: Check for missing names
      assert:
        that: nsapp_lb_server | rejectattr('name', 'defined') | length == 0
        fail_msg: "Bad entries: {{ nsapp_lb_server | rejectattr('name', 'defined') }}"

    - name: Check for bad names
      assert:
        that: nsapp_lb_server | rejectattr('name') | length == 0
        fail_msg: "Bad entries: {{ nsapp_lb_server | rejectattr('name') }}"

    - name: Check for missing IP addresses
      assert:
        that: nsapp_lb_server | rejectattr('ipaddress', 'defined') | length == 0
        fail_msg: "Bad entries: {{ nsapp_lb_server | rejectattr('ipaddress', 'defined') }}"

    - name: Check for bad IP addresses
      assert:
        that: (nsapp_lb_server | map(attribute='ipaddress') | map('ipaddr')) == (nsapp_lb_server | map(attribute='ipaddress'))
        fail_msg: "Suspicious values: {{ nsapp_lb_server | map(attribute='ipaddress') | map('ipaddr') | symmetric_difference(nsapp_lb_server | map(attribute='ipaddress')) }}"

    - name: Check for bad comments
      assert:
        that: nsapp_lb_server | selectattr('comment', 'defined') | rejectattr('comment') | length == 0
        fail_msg: "Bad entries: {{ nsapp_lb_server | selectattr('comment', 'defined') | rejectattr('comment') }}"

This runs in ~12 seconds for the same list of 30,000 test entries.

Slow Ansible performance when using loop with large YAML var

Answers (2)

Related Questions