Reputation: 12395
I have a systemd
service that runs a SQL script that refreshes a bunch of materialized views. The service runs every 5 minutes. The service is deployed by ansible. If the SQL script fails, I would like to send an email, which notifies us of failure. The current code (see the yml chunk below) will send the notification email every 5 min or so, until one of us fixes the problem or stops the service. This is too frequent: one email is enough, and is exactly what I need.
How can I send one and only one email, even if the script fails repeatedly?
I am considering using a wrapper script such as this pseudocode, but it looks ugly:
# Runs every 5 min:
psql -f refresh_matviews.sql || touch refresh_matviews.failed.log
if { exists refresh_matviews.failed.log }
and { not grep "seen" refresh_matviews.failed.log } then
echo "failed!" | mail [email protected]
echo "seen" > refresh_matviews.failed.log
If such a wrapper script is used, then whoever fixes the problem needs to also manually clear the (now outdated) failure file (rm refresh_matviews.failed.log
), so that any new failure triggers a new email.
The relevant chunk of the yml file for ansible
:
- name: Add systemd service that refreshes matviews
copy:
content: |
# This service unit refreshes matviews
#
[Unit]
Description=Refreshes matviews
Wants=refresh_matviews.timer
[Service]
User=galaxy
Type=oneshot
ExecStart=/bin/bash -c '/usr/bin/psql ... -f /path/to/refresh_matviews.sql || echo 'WARNING' | /usr/bin/mail -s "not ok: refresh matviews" [email protected]'
[Install]
WantedBy=multi-user.target
dest: /etc/systemd/system/refresh_matviews.service
owner: root
group: root
mode: 0644
- name: Add systemd timer that refreshes matviews
copy:
content: |
# This timer unit refreshes matviews
#
[Unit]
Description=Refreshes matviews
Requires=refresh_matviews.service
[Timer]
Unit=refresh_matviews.service
OnCalendar=*-*-* *:00/5:00
[Install]
WantedBy=timers.target
dest: /etc/systemd/system/refresh_matviews.timer
owner: root
group: root
mode: 0644
It seems that ansible
/systemd
should have something similar to what I need, but this is all I could find:
Upvotes: 5
Views: 750
Reputation: 12120
In respect to the given use case description
... until one of us fixes the problem or stops the service.
and the comment which was made, you may consider the SQL script error status as a fact about the system. So you could simply introduce a Custom Fact
add dynamic facts by adding executable scripts to
facts.d
.
so the next time fact gathering is run, your facts will just include the script error status and you can proceed further with Conditionals based on ansible_facts
.
Even if currently
... the service (annot.: only) is deployed by Ansible.
this approach will help to maintain the status via Ansible as well a separate script which is sending email alerts.
Regarding
... but could not figure out how exactly I can use custom facts and conditionals based on ansible facts in my specific case.
I've added some more specific information for How to implement and use a Custom Fact? and I am focus on the Ansible part only.
Use Case and Rapid Prototype
How to implement Custom Facts?
First I need to find out the Streaming Replication Status on Secondary Node. This can be run as script or cronjob on the node frequently.
psql -c "SELECT pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp()" -x -t
Credits to
For facts.d or local facts and in order
To use facts.d, create an
/etc/ansible/facts.d
directory on the remote host or hosts. ... Add files to the directory to supply your custom facts. All file names must end with.fact
. The files can be JSON, INI, or executable files returning JSON.
Since I am going to do further processing with Ansible, Python, I like to format the result in JSON before (pre-process) as it will make processing easier later.
psql -c "SELECT json_agg(t) FROM (SELECT pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp()) t" -x -t | cut -d "|" -f 2
The output can be done directly into the fact file via
> /etc/ansible/fact.d/streaming.fact
depending on testing and outcome, add-on's could be
| tr -d "[:blank:]\n"
# or
| tr -d "[:blank:][]\n" # <- I've used this in my example
or even the return code of script or cronjob. Like in this clumsy
; echo "{\"rc\":\"${?}\"}" > /etc/ansible/fact.d/script.fact
or something via
./script.sh; jq --null-input --monochrome-output --arg rc "$?" '$ARGS.named' > /etc/ansible/facts.d/script.fact
Credits to
So far the implementation, which produces two files on the Secondary Node.
~/test$ tree /etc/ansible/facts.d/
/etc/ansible/facts.d/
├── script.fact
└── streaming.fact
~/test$ cat /etc/ansible/facts.d/script.fact
{"rc":"0"}
~/test$ cat /etc/ansible/facts.d/streaming.fact
{"pg_is_in_recovery":true,"pg_is_wal_replay_paused":false,"pg_last_wal_receive_lsn":"1/AB2345CD","pg_last_wal_replay_lsn":"1/AB2345CD","pg_last_xact_replay_timestamp":"2023-02-01T09:00:00.00000+01:00"}
How to use the Custom Fact?
A minimal example playbook
---
- hosts: localhost
become: false
gather_facts: true
tasks:
- name: Show Facts
debug:
msg: "{{ ansible_facts.ansible_local }}"
will result into an output of
TASK [Show Facts] ****************************************************
ok: [localhost] =>
msg:
script:
rc: '0'
streaming:
pg_is_in_recovery: true
pg_is_wal_replay_paused: false
pg_last_wal_receive_lsn: 1/AB2345CD
pg_last_wal_replay_lsn: 1/AB2345CD
pg_last_xact_replay_timestamp: '2023-02-01T09:00:00.00000+01:00'
or in case of a failure of fact file generation
TASK [Gathering Facts] **********************************************************************************
[WARNING]: error loading facts as JSON or ini - please check content: /etc/ansible/facts.d/script.fact
ok: [localhost]
TASK [Show Facts] ***************************************************************************************
ok: [localhost] =>
msg:
script: 'error loading facts as JSON or ini - please check content: /etc/ansible/facts.d/script.fact'
A Conditional based on (even custom) ansible_facts could then look like
- name: Show Facts
debug:
msg: "{{ ansible_facts.ansible_local.streaming }}"
when: not ansible_facts.ansible_local.script.rc | bool # if there was no failure, rc=0
In case of failed script run producing an Exit Code 1 and a fact file content of rc: 1
it would just skip the task.
Upvotes: 1
Reputation: 44760
Enable a persistent fact cache backend in your project ansible.cfg
. See this entrypoint in documentation. For the example I use a simple json file cache:
[defaults]
fact_caching=jsonfile
fact_caching_connection=/tmp/ansible_cache
Once the cache is enabled, the idea is to:
Here is some pseudo code to give your the idea:
- name: My maybe failing tasks
command: /bin/false
ignore_error: true
register: my_cmd
- name: Send an email if relevant
mail:
# Your mail task options
when:
- previous_run_ok | d(true) | bool
- my_cmd is failed
- name: Register result of this run for later (cached fact)
set_fact:
previous_run_ok: "{{ my_cmd is success }}"
Upvotes: 3