William
William

Reputation: 82

How do I create an alert which fires when one of many machines fails to report a heartbeat?

Overall I'm trying to set up an azure alert to email me when a computer goes down by using the Heartbeat table.

Let's say I have 5 machines in my Azure subscription, and they each report once per minute to the table called Heartbeat, so it looks something like this:

Table Example

Currently, I can query "Heartbeat | where Computer == 'computer-name'| where TimeGenerated > ago(5m)" and figure out when one computer has not been reporting in the last 5 minutes, and is down (thank you to this great article for that query).

I am not very experienced with any query language, so I am wondering if it is possible to have 1 query which can check to see if there was ANY computer which stopped sending it's logs over the last 5-10 minute period, and thus would be down. Azure uses KQL, or Kusto Query Language for it's queries and there is documentation in the link above.

Thanks for the help

Upvotes: 2

Views: 672

Answers (1)

Yoni L.
Yoni L.

Reputation: 25895

one option is to calculate the max report time for each computer, then filter to the ones whose max report time is older than 5 minutes ago:

let all_computers_lookback = 12h
;
let monitor_looback = 5m
;
Heartbeat
| where TimeGenerated > ago(all_computers_lookback)
| summarize last_reported = max(TimeGenerated) by computer
| where last_reported < ago(monitor_looback)

another alternative:

  • the first part creates an "inventory" of all computers that reported at least once in the last (e.g. 12 hours).
  • the second part finds all computers that reported at least once in the last (e.g. 5 minutes)
  • the third and final part finds the difference between the two (i.e. all computers that didn't report in the last 5 minutes)

Note: if you have more than 1M computers, you can use the join operator instead of the in() operator

let all_computers_lookback = 12h
;
let monitor_looback = 5m
;
let all_computers = 
    Heartbeat
    | where TimeGenerated > ago(all_computers_lookback)
    | distinct computer
;
let reporting_computers = 
    Heartbeat
    | where TimeGenerated > ago(monitor_looback)
    | distinct computer
;
let non_reporting_computers =
    all_computers
    | where computer !in(reporting_computers)
;
non_reporting_computers

Upvotes: 2

Related Questions