Reputation:
Last week we had performed the windows patching activity and after the windows patch we confirmed that everything is fine on the SQL server. All the databases were accessible and all the databases were ONLINE. All the SQL Server Services were up and running. SQL Server Agent was also fine.
Then later on when we were looking for the latest database backups.; we could not find any backups even though there was a maintenance plan for the same.
So we decided to do an RCA why the scheduled backups were not happening on the server? We started checking the SQL Server Error Log / Windows Log / Application Log but could not find anything.
Finally when checked the SQL Server Agent's Log we found the following errors in it which were repeated continuously for 3 days:
[393] Waiting for SQL Server to recover database 'msdb'... [298] SQLServer Error: 16389, Communication link failure [SQLSTATE 08S01] (ConnCheckIfDBIsOnline) [298] SQLServer Error: 10004, Communication link failure [SQLSTATE 08S01] (ConnCheckIfDBIsOnline) [298] SQLServer Error: 16389, Communication link failure [SQLSTATE 08S01] (ConnCheckIfDBIsOnline) [393] Waiting for SQL Server to recover database 'msdb'... [298] SQLServer Error: 16389, Communication link failure [SQLSTATE 08S01] (ConnCheckIfDBIsOnline) [298] SQLServer Error: 10004, Communication link failure [SQLSTATE 08S01] (ConnCheckIfDBIsOnline) [298] SQLServer Error: 16389, Communication link failure [SQLSTATE 08S01] (ConnCheckIfDBIsOnline) [393] Waiting for SQL Server to recover database 'msdb'... [298] SQLServer Error: 16389, Communication link failure [SQLSTATE 08S01] (ConnCheckIfDBIsOnline) [298] SQLServer Error: 10004, Communication link failure [SQLSTATE 08S01] (ConnCheckIfDBIsOnline) [298] SQLServer Error: 16389, Communication link failure [SQLSTATE 08S01] (ConnCheckIfDBIsOnline) [393] Waiting for SQL Server to recover database 'msdb'... [298] SQLServer Error: 16389, Communication link failure [SQLSTATE 08S01] (ConnCheckIfDBIsOnline) [298] SQLServer Error: 10004, Communication link failure [SQLSTATE 08S01] (ConnCheckIfDBIsOnline) [298] SQLServer Error: 16389, Communication link failure [SQLSTATE 08S01] (ConnCheckIfDBIsOnline) [393] Waiting for SQL Server to recover database 'msdb'... [298] SQLServer Error: 16389, Communication link failure [SQLSTATE 08S01] (ConnCheckIfDBIsOnline) [298] SQLServer Error: 233, Communication link failure [SQLSTATE 08S01] (ConnCheckIfDBIsOnline) [298] SQLServer Error: 233, Shared Memory Provider: No process is on the other end of the pipe. [SQLSTATE 08S01] (ConnCheckIfDBIsOnline) [393] Waiting for SQL Server to recover database 'msdb'...
My machine details are as follows:
Windows server: Windows server 2016
SQL Server Version: Microsoft SQL Server 2016 (SP1) (KB3182545) - 13.0.4001.0 (X64) Oct 28 2016 18:17:30 Copyright (c) Microsoft Corporation
Standard Edition (64-bit) on Windows Server 2016 Datacenter 6.3 (Build 14393: ) (Hypervisor)
Can anyone let us why the SQL Server Agent was not showing offline / how did the SQL Server went into an ONLINE state whilst in the SQL Server Agent Log it was showing "Waiting for SQL Server to Recover the MSDB...". Also we check the SQL Server services status on daily basis through "sys.dm_server_services" DMV but it never showed us that the SQL Server Agent was not up and Running.
Upvotes: 1
Views: 3914
Reputation: 1
Besides the workaround we have added an alert for this string in the log as it can indicate msdb is not recovered and got hung up. If you see more than 1 count of that in a timespan, it's worth checking to see if agent is hung. In the meantime if you don't have an agent, you won't have your jobs and so you won't know if they succeed or fail, which normal 'look for fails' logic can miss.
Upvotes: 0