Martin Guthrie
Martin Guthrie

Reputation: 143

Ansible playbook wait until all pods running

I have this ansible (working) playbook that looks at the output of kubectl get pods -o json until the pod is in the Running state. Now I want to extend this to multiple pods. The core issue is that the json result of the kubectl query is a list, I know how to access the first item, but not all of the items...

- name: wait for pods to come up
  shell: kubectl get pods -o json
  register: kubectl_get_pods
  until: kubectl_get_pods.stdout|from_json|json_query('items[0].status.phase') == "Running"
  retries: 20

The json object looks like,

[  { ...  "status": { "phase": "Running" } },
   { ...  "status": { "phase": "Running" } },
   ...
]

Using [0] to access the first item worked for handling one object in the list, but I can't figure out how to extend it to multiple items. I tried [*] which did not work.

Upvotes: 14

Views: 21144

Answers (7)

Wiper
Wiper

Reputation: 95

The kubernetes.core.k8s plugin for Ansible has a built in wait functionality !

However the problem with this is that different resources have different wait_condition types. If you are using a deployment then as seen below type: Complete works well as long as you set the correct timeout bounds, but if you have different resource types in the yaml like serviceaccounts it will most likely hang.

- name: Deploy the stack
  community.kubernetes.k8s:
    state: present
    src: "{{ dir }}my.yaml"
    wait: yes
    wait_sleep: 10
    wait_timeout: 600
    wait_condition:
      type: Complete
      status: "True"

Upvotes: 5

ome13
ome13

Reputation: 101

Finally, i took the answer from https://stackoverflow.com/users/2989261/rico as it seems the most reliable (kubectl wait --for=confition=ready was not reliable for my kubectl version 1.23.5) and just added a selector for all my pods (this is a good practice to filter Completed jobs) like this:

tasks:
- name: wait for pods from group app-xx to come up
  shell: kubectl get pods --selector=app-group=app-xx -o json
  register: kubectl_get_pods
  until: kubectl_get_pods.stdout|from_json|json_query('items[*].status.phase')|unique == ["Running"]
  retries: 12
  delay: 5

Upvotes: 0

Noam Manos
Noam Manos

Reputation: 17040

A one-liner shell command would do:

kubectl wait pod --all -n ${the_namespace} --timeout=3m \
--for=condition=ready --field-selector=status.phase!=Succeeded

It will wait up to 3 minutes for all pods to be running in the namespace. The field selector is made to ignore completed pods that Succeeded (Capital "S" is important).

Upvotes: 0

Xavi Martínez
Xavi Martínez

Reputation: 2161

You can use kubernetes.core.k8s_info from kubernetes.core collection

For example, wait for cert-manager to be up in the cert-manager namespace:

- name: Wait until cert-manager is up
  kubernetes.core.k8s_info:
    kubeconfig: "{{ kubeconfig }}"
    api_version: v1
    kind: Pod
    namespace: cert-manager
  register: pod_list
  until: pod_list|json_query('resources[*].status.phase')|unique == ["Running"]

Upvotes: 4

AnjK
AnjK

Reputation: 3613

Kubernetes version v1.23.0 (changelog) added ability for kubectl wait to wait on arbitary JSON path.

So, it seems that kubectl wait can be used now to wait for status phases also.

Wait for the pod "busybox1" to contain the status phase to be "Running" :

kubectl wait --for=jsonpath='{.status.phase}'=Running pod/busybox1

You can use this command in ansible playbook task:

- name: Wait for the pods to come up with status 'Running'
  shell: "kubectl wait -n kube-system --for=jsonpath='{.status.phase}'=Running pods --selector tier=control-plane --timeout=120s"
  register: control_plane_pods_running

- debug: var=control_plane_pods_running.stdout_lines

Upvotes: 0

Eduardo Baitello
Eduardo Baitello

Reputation: 11376

The kubectl wait command

Kubernetes introduced the kubectl wait in v1.11 version:

CHANGELOG-1.11:

  • kubectl wait is a new command that allows waiting for one or more resources to be deleted or to reach a specific condition. It adds a kubectl wait --for=[delete|condition=condition-name] resource/string command.

CHANGELOG-1.13:

  • kubectl wait now supports condition value checks other than true using --for condition=available=false

CHANGELOG-1.14:

  • Expanded kubectl wait to work with more types of selectors.
  • kubectl wait command now supports the --all flag to select all resources in the namespace of the specified resource types.

It is not intended to wait for phases, but for conditions. I think that waiting for conditions is much more assertive than waiting for phases. See the following conditions:

  • PodScheduled: the Pod has been scheduled to a node;
  • Ready: the Pod is able to serve requests and should be added to the load balancing pools of all matching Services;
  • Initialized: all init containers have started successfully;
  • ContainersReady: all containers in the Pod are ready.

Using kubectl wait with Ansible

Suppose that you are automating a Kubernetes install with kubeadm + Ansible, and need to wait for the installation to complete:

- name: Wait for all control-plane pods become created
  shell: "kubectl get po --namespace=kube-system --selector tier=control-plane --output=jsonpath='{.items[*].metadata.name}'"
  register: control_plane_pods_created
  until: item in control_plane_pods_created.stdout
  retries: 10
  delay: 30
  with_items:
    - etcd
    - kube-apiserver
    - kube-controller-manager
    - kube-scheduler

- name: Wait for control-plane pods become ready
  shell: "kubectl wait --namespace=kube-system --for=condition=Ready pods --selector tier=control-plane --timeout=600s"
  register: control_plane_pods_ready

- debug: var=control_plane_pods_ready.stdout_lines

Result Example:

TASK [Wait for all control-plane pods become created] ******************************
FAILED - RETRYING: Wait all control-plane pods become created (10 retries left).
FAILED - RETRYING: Wait all control-plane pods become created (9 retries left).
FAILED - RETRYING: Wait all control-plane pods become created (8 retries left).
changed: [localhost -> localhost] => (item=etcd)
changed: [localhost -> localhost] => (item=kube-apiserver)
changed: [localhost -> localhost] => (item=kube-controller-manager)
changed: [localhost -> localhost] => (item=kube-scheduler)

TASK [Wait for control-plane pods become ready] ********************************
changed: [localhost -> localhost]

TASK [debug] *******************************************************************
ok: [localhost] => {
    "control_plane_pods_ready.stdout_lines": [
        "pod/etcd-localhost.localdomain condition met", 
        "pod/kube-apiserver-localhost.localdomain condition met", 
        "pod/kube-controller-manager-localhost.localdomain condition met", 
        "pod/kube-scheduler-localhost.localdomain condition met"
    ]    
}

Upvotes: 23

Rico
Rico

Reputation: 61699

I would try something like this (works for me):

tasks:
- name: wait for pods to come up
  shell: kubectl get pods -o json
  register: kubectl_get_pods
  until: kubectl_get_pods.stdout|from_json|json_query('items[*].status.phase')|unique == ["Running"]

You are basically getting all the statuses for all the pods and combining them into a unique list, and then it won't complete until that list is ["Running"]. So for example, if all your pods are not running you will get something like ["Running", "Starting"].

Upvotes: 13

Related Questions