Daniel Björk
Daniel Björk

Reputation: 2507

Azure Monitor avoid false positives on VPN disconnect

We are using Azure Monitor to monitor if our Virtual Network Gateway S2S VPN connections disconnects (we have a few connections in each environment), but we would like to reconfigure so that we only get alerts if the connection been down for more than one minute to avoid alerts when the tunnel is reset.

Today we are using this log analytics query which creates false alerts, do you have any suggestions how we can create this

 AzureDiagnostics
 | where Category == "TunnelDiagnosticLog"
 | order by TimeGenerated

Here is an example of what we don't want to trigger an alert. Note that just excluding the GlobalStandby change events won't do it since its not guaranteed that the tunnel connects again.

enter image description here

Configuration in Azure Monitor: enter image description here

Upvotes: 1

Views: 1947

Answers (2)

Daniel Björk
Daniel Björk

Reputation: 2507

Using Log Analytics I came up with this query that will check the next line in the log to see if its Connected or not and compare the timespan between them.

AzureDiagnostics | serialize 
| where Category == "TunnelDiagnosticLog"
| where TimeGenerated < ago(120s) and TimeGenerated > ago(600m)
| extend Result = iif(
    (OperationName == "TunnelDisconnected" 
        and next(OperationName) == "TunnelConnected"
        and next(TimeGenerated)-TimeGenerated < 1m)
    or OperationName == "TunnelConnected", 0, 1)
| project TimeGenerated, 
    OperationName, 
    next(OperationName), 
    Result, 
    next(TimeGenerated)-TimeGenerated,
    Resource, 
    ResourceGroup, 
    _ResourceId 
| project-rename Downtime=Column2, NextStatus=Column1
| sort by TimeGenerated asc
| where OperationName == "TunnelDisconnected" and Result == 1

Upvotes: 0

stackoverflowusrone
stackoverflowusrone

Reputation: 536

You can try creating Metric measurement log alert with AggregatedValue as count of disconnections aggregated by column with values GatewayTenantWorker... (and any other column as needed) and binned per minute in your log query and configure the alert with threshold as 0 (for any disconnections) and trigger based on consecutive breaches greater than 1 (for more than 1 minute, or 2 for more than 2 minutes (to reduce even more false alerts)).

This should fire an alert when there are any disconnections for more than 1 (or 2) minute(s) in any of the VPN connections.

Assumptions about the data -

  • Tunnel resets are resolved within a minute.
  • In case of actual long disconnection, there would be log for current status (Disconnected) per minute. Above solution works only in this case.

If assumptions do not hold true, information about log data pattern in case of long disconnection is needed.

Upvotes: -1

Related Questions