When it comes to monitoring databases we have to do it proactively by using alerts implemented natively or via T-SQL code inside SQL Jobs running recurrently (I know we do have the AlwaysOn Dashboard, but personally I do not have time to check it all the time). As a result of this situation, so rapidly do we have databases working on production environment that we may need to have alerts working as well. In this post I am going to show how to have the most important AlwaysOn Availability Group alerts implemented. To begin with, I would basically suggest creating alerts only for these error messages.
Undoubtedly, we may need more alerts. Therefore, we will need to look into ‘sys.messages’ system object and look for error messages closely related to AlwaysOn Availability Group feature. Only after identifying them will we be able to create particular alerts taking the ‘message_id’ number. To illustrate, I will query ‘sys.messages’ by using keywords like ‘Availability’, ‘suspended’, ‘failed’, ‘disconnected’, etc.
SELECT * FROM sys.messages
WHERE ( [text] LIKE '%Availability%Disconnected%' OR [text] LIKE '%Availability%not%synchroni%'
OR [text] LIKE '%Availability%suspended%' OR [text] LIKE '%Availability%failed%' OR [text] LIKE '%Availability%chang%role%'
OR [text] LIKE '%Availability%resumed%' ) AND language_id=1033
As you have seen, they are about 63 messages (depending on the SQL Server version you are working on). I do believe that these messages are the be-all and end-all so we might not need to implement additional alerts. Put differently, they are enough. Nevertheless, we might filter out some messages we do not consider important and include others to monitor AlwaysOn Failover Cluster instead.
SELECT * FROM sys.messages
WHERE ( [text] LIKE '%Availabiliy%replica%' OR [text] LIKE '%Database%replica%' OR [text] LIKE '%primary%role%'
OR [text] LIKE '%secondry%role%' OR [text] LIKE '%availability%group%' OR [text] LIKE '%WSFC%'
OR [text] LIKE '%primary%replica%' OR [text] LIKE '%secondary%replica%' OR [text] LIKE '%alwayson%')
AND [text] NOT LIKE '%No user action is required.%'
AND severity<>10
AND language_id=1033
After having got the error number, the next step is to create the alert for the error by using ‘sp_add_alert’ system stored procedure which accepts basic parameters like the name for the alert and ‘message_id’ (taken from ‘sys.messages’). The rest of parameters will be the same for each alert. So, you can create the other alerts by replacing the name and ‘message_id’, for instance, the following example is based on the error number 41418. (We also create the notification to send an email to us when the alert is raised. Obviously, not until having successfully configured SQL Agent service to use Database Mail will you be able to receive the email.)
USE [msdb]
GO
EXEC msdb.dbo.sp_add_alert @name=N'41418 HADR Alert',
@message_id=41418,
@severity=0,
@enabled=1,
@delay_between_responses=0,
@include_event_description_in=1,
@notification_message=N'Contact your Database Administrator urgently.',
@job_id=N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'41418 HADR Alert', @operator_name=N'DBA', @notification_method = 1
To try out to see whether or not the alert is working, we will need to raise the error manually via RAISERROR command with ‘LOG’ parameter (to see the message logged in the SQL Server error log).
RAISERROR(41418,16,1) WITH LOG;
Having done that, we can see the following message in error log:
Likewise, you will get the email.
To sum up, I am convinced that these sorts of alert will always work for monitoring Availability Groups and others. You just need to spend some time on figuring out about the error messages and then go ahead and create the right alert for you. I hope you enjoy this post and let me know any remark you may have. Thanks for reading again.