Thursday, 21 July 2016

The 'SkipErrors' parameter for the Replication Distribution Agent

Having encountered numerous errors in SQL Server Replication, I can confidently say that the majority occur at the replication agent level, such as with the Distribution Agent, Log Reader Agent, Snapshot Agent, Queue Reader Agent, and others. Unfortunately, many of these issues are related to Primary and Foreign Key conflicts, which can take significant time to resolve, either individually or by reconfiguring the replication. To avoid downtime and allow operations to continue while we work on fixing these errors, we can use a helpful option: the 'SkipErrors' parameter. I've successfully used this parameter many times to bypass such issues, and it can also be applied to skip other types of errors as needed.
 
Today's post will show how to use the' SkipErrors' parameter which allows to skip specific errors so that the data synchronization process is not stopped. This parameter is configurable in the profile of the Distribution Agent and has as input the error number we may want to skip.
 
The following picture shows an error (with code 547 related to an Foreign Key issue) in the Distribution Agent process and we see how the transactions are being queued due to this error, consequently, there is a need to fix it so as to allow the rest of pending transactions are moved on. (Distribution Agent reads sequentially the table 'msrepl_commands' to get the command to execute in the subscribers, this means that First in the Queue is the First Out to be moved to subscribers.)



Other common errors where you can use 'SkipErrors' parameter is when there are not some rows in the subscriber to apply the changes (The row was not found at the Subscriber when applying the replicated command). The error code for this case is 20598.
 

I mentioned before that the 'SkipErrors' parameter is configurable inside the Distribution Agent profile and that is what we are going to do right now. Firstly, we need to create a customized profile based on the Default profile and write in the 'Value' column the numbers of the errors (to be skipped)  separated by colons as we can see in the following picture 20598:547.



Having done that, we may have to restart the Distribution Agent. Next time the Distribution Agent starts up, it will load the new customized profile with the error codes to be skipped (ignored). Finally, not only will we verify that many errors are skipped, but also the Distribution Agent is running with no problem.
 

Just to finish writing this post, keep in mind that the transactions with these errors were missed, which means that you will not be able to recover them and may affect your data consistency. In other words, we must use 'SkipErrors' parameter with extreme caution. Let me know any remark you may have. Thanks for reading.

Wednesday, 20 July 2016

How to query the windows users that access via windows groups from SQL Server

At times while monitoring a database server, we may need to know not only which SQL users are accessing, but also windows users for security and auditing purposes. Frankly, this task may be quite simple to get it completed when all of those users are created and visible inside the database engine. Despite this, it is not a surprise that windows users can access via windows groups which means that DBAs can not see them from SQL Server easily unless we have the manner to list them. Luckily, there is an extended stored procedure named “xp_logininfo” that we need to use to get that information. (The 'xp_logininfo' asks the Active Directory for the windows users.)
First of all, this system stored procedure takes two input parameters. The first one is the windows group name, and the second one is the value ‘members’.  For instance, it lists the windows users that access via the windows group ' MyDomain\SQLProdUsrs'. (You need to have “sysadmin” role to execute it.)
EXEC xp_logininfo  @acctname ='MyDomain\SQLPrdUsrs',  @option='members' 
Naturally, this is perfect if we only have one windows group to query. As time goes by, we are likely to get created more windows groups inside the database engine and in no time we will need to create an specific script to figure out the other windows users. Here is that code.
SET NOCOUNT ON

CREATE TABLE #WindowGroup(

    server_name varchar(100),

    account_name varchar(300),

    type char(8),

    privilege char(9) ,

    mapped_login_name varchar(300) ,

    permission_pathsysname  varchar(300)

 )

 

DECLARE @WindowGroupName varchar(max)

DECLARE @db [NCHAR](128)  

DECLARE cursor_WG CURSOR FOR SELECT [name] FROM sys.server_principals WHERE TYPE='G' 

 

OPEN cursor_WG

    FETCH NEXT FROM  cursor_WG INTO @WindowGroupName

    WHILE @@FETCH_STATUS= 0

        BEGIN

            INSERT #WindowGroup(account_name , type , privilege  ,mapped_login_name  ,permission_pathsysname   )

            EXEC xp_logininfo  @acctname =@WindowGroupName  ,  @option =  'members' 

        FETCH NEXT FROM cursor_WG INTO @WindowGroupName

    END 

CLOSE cursor_WG

DEALLOCATE cursor_WG

 

UPDATE #WindowGroup SET server_name=@@servername

select server_name AS ServerName, account_name as WindowsAccountName, Type, Privilege, mapped_login_name as MappedLoginName, permission_pathsysname as WindowsGroupName

FROM #WindowGroup 

ORDER BY permission_pathsysname,mapped_login_name

DROP TABLE #WindowGroup

 

SET NOCOUNT OFF


As you have seen, the T-SQL code is simply easy to understand. Clearly, it filters the windows groups by indicating the value ‘G’ for the Type column of the system view “sys.server_principals”, and eventually through the cursor each of them is read. Now I hope you make the most out of this script to have better visibility of the windows users accessing the databases. Thanks for reading again!

Saturday, 16 July 2016

SQL Mirroring Port could stop working unexpectedly

When it comes to diagnosing and fixing some SQL Mirroring issues we may have to face with bizarre causes that could stop working a database mirroring and as a result the state of it will be "disconnected". For instance, one of the most common problems is that the SQL mirroring port may stop working unexpectedly. Consequently, the whole database mirroring functionality will stop working too. We basically configure SQL Server mirroring to use the port 5022 by default which works good enough for the vast majority of cases. To be perfectly honest, there is not an urgent need to change it since the issue is not about the port number (or mirroring configuration) but the port itself.

While working on fixing the issue, we may think the database mirroring port may have stopped working or blocked for some security reason which we do not know accurately at the beginning of the situation. Nevertheless, the reason should be investigated further by the Network team so as to prevent from getting the same issue in the future again. Having clear the situation, I am going to show one way to have the database mirroring working again. To begin with, we need to figure out what is the current number port of the mirroring being used.



We can see that the mirroring port is 5022, so we can opt to change it.

ALTER ENDPOINT [Mirroring] AS TCP (LISTENER_PORT = 5023)

Only after successfully completing that will we have to restart the database engine. Finally, we need to restore the original port and then restart again the database engine.

ALTER ENDPOINT [Mirroring] AS TCP (LISTENER_PORT = 5022)

Having done it, we will see the database mirroring working properly. That's is all for now. I hope this tip also works for you. Let me know any remarks you may have. Thanks for reading.

Thursday, 30 June 2016

Error 20598: The row was not found at the Subscriber when applying the replicated command

Having transactional replication environments with read-only subscribers (which means that changes are not propagated back to the publisher), it is very important to understand that rows on subscribers must NOT be modified directly. Otherwise, we will get a big problem. Let me expand on what I am saying. For instance, if any row on the subscriber (which was replicated from publisher) is deleted and then when this same row on published is modified, the following error will be raised:
The row was not found at the Subscriber when applying the replicated command
Clearly, this issue is because the row to be updated on subscriber does not exist any longer while Distribution Agent is trying to propagate it. Therefore, there is a need to fix it as soon as possible to prevent the replication queue from growing so much. To solve this case, we must review the pending commands inside the 'distribution' database by using 'sp_browsereplcmds' in order to identify the affected transaction(s) and row(s), and then insert the missing row manually in the subscriber (or delete the command from queue, however, this recommendation can be taken into account only if someone deleted the row by mistake or you do not need it anymore).

Another technique we have is to use the 'SkipErrors' parameter which allows to skip errors of a certain type (for this issue the error number is 20598), which means that the affected transaction is simply ignored and skipped. Keep in mind that these sorts of error must be treated with extreme caution and a correct understanding of the situation.

That is all for now, let know any remark you may have. Thanks for reading.

Friday, 3 June 2016

How to move the files of database which has Replication, Mirroring, Log Shipping or AlwaysOn Settings

One of the challenging tasks in the life of a DBA is definitely moving all or some of the files of a database from one physical location to another one because of performance issues, maintenance requirements, disk space issues, etc. We usually move database files to another location by using Backup/Restore or Detach/Attach procedures. They are the most proper methods for most of the business cases but not for all. Let me expand on what I mean, for instance, those methods will not work with databases which have Replication, Mirroring, Log Shipping or AlwaysOn Settings because you will have to remove these settings before move them and then you should set up every setting again which could waste your time and have your database service stopped further than necessary. In this situation Backup/Restore or Detach/Attach simply is NOT an option because we need to make the database available as soon as possible. So, what we must do in order to move files of this type of database is by modifying the physical name of each database file we want to move. For instance, in the following code I will move 4 files (3 Data Files and 1 Log File):

ALTER DATABASE SalesDB MODIFY FILE (NAME=N'SalesDB_Data01', FILENAME= N'D:\SQLData\SalesDB\SalesDB_Data01.mdf')

ALTER DATABASE SalesDB MODIFY FILE (NAME=N'SalesDB_Data02', FILENAME= N'D:\SQLData\SalesDB\SalesDB_Data02.ndf')

ALTER DATABASE SalesDB MODIFY FILE (NAME=N'SalesDB_Data03', FILENAME= N'D:\SQLData\SalesDB\SalesDB_Data03.ndf')

ALTER DATABASE SalesDB MODIFY FILE (NAME=N'SalesDB_Log', FILENAME= N'E:\SQLLog\SalesDB\SalesDB_Log.ldf')


It is very important to verify that new database file folders already exist, if so, this should be the output results:

The file “SalesDB_Data01” has been modified in the system catalog. The new path will be used the next time the database is started.
The file “SalesDB_Data02” has been modified in the system catalog. The new path will be used the next time the database is started.
The file “SalesDB_Data03” has been modified in the system catalog. The new path will be used the next time the database is started.
The file “SalesDB_Log” has been modified in the system catalog. The new path will be used the next time the database is started.


What’s next? We must stop the SQL Engine Service and then manually move every database file to the new location we indicated in the code above. Finally, we have to start the SQL Engine service which will load the files from the new location. With this method you do not need to remove any setting mentioned before. This is extremely effective and there is no doubt that it will work. Having these files moved to the new location, the database will start without any problem. If not, you should make sure that the SQL Service account has Full Control permission on database files from the new location.
I hope this tip helps you to save time and it will ensure that your database will be available quickly. I will be pleased to answer any question you may have. Thanks for reading!.

Wednesday, 1 June 2016

How to test Read-Only Intent Connection from SQL Management Studio

As part of some SQL Server AlwaysOn Availability implementations, we may need to test Read-Only Intent Connection in someway to make sure that it is working well at SQL Server level. We can verify it by using SQL Management Studio. Let me expand on what I mean, AlwaysOn Technology give us an option to implement Read-Only Intent mode to enable SQL Server to redirect read-only connections to secondary replicas, it means off-loading Read-Only workloads to secondary replicas. Undoubtedly, this is a gripping feature, at first glance, it drew my attention to test it and I verified that is truly useful for business cases where we need to have Read-Only Intent feature working properly and automatically. It couldn’t have been better when I learned that I could use SQL Management Studio to achieve my purpose.
Keep it in mind that we need to have set up Read-Only Routing List before going to test it, obviously. Now carrying on this tip, let me show you some explicit pictures where you will see which parameters you should consider. First of all,  you need to go on 'Login' tab and write the Listener Name of your AlwaysOn Availability Group. In this example, my Listener Name is SRV1LIDBVB which represents the Virtual Server Name.



What’s next? going to 'Additional Connection Parameters' tab, you will see two parameters. You must write the database name for 'Database' which is in your AlwaysOn Availability Group and 'ReadOnly' for 'ApplicationIntent'. The both parameters are separated by a command.



Finally, click on 'Connect' and you will be connected to any secondary replica, which one replica? it depends on what you have set up in your Read-Only Routing List. This is all for now. I hope you find this post practical, effective and easy to put in practice and include it in your testing plan. Let me know any remark you may have. Thanks for reading!.

Tuesday, 24 May 2016

How to change collation of all columns

Just thinking about some interesting tools that could be useful for doing some DBA tasks, I would like to share my code to change the collation of all columns of all SQL Server tables (I mean User Tables, not System Tables). To begin with, I will show you a basic code to filter columns by an specific collation:

select tb.schema_id, tb.name,c.name,  c.collation_name, t.name, c.max_length, c.is_nullable,c.column_id 

from sys.columns c

inner join sys.types t on t.user_type_id= c.user_type_id

inner join sys.tables tb on  c.object_id=tb.object_id

where c.collation_name is not null 

and t.is_user_defined=0 and tb.is_ms_shipped=0 and tb.name<>'sysdiagrams'

and c.collation_name<>'SQL_Latin1_General_CP1_CI_AS'

order by tb.name, c.column_id

By executing it you will list every column that has a different collation you would like to change. Now I am going to show you the code that generates the code to change the collation of columns. After executing this code you must take the output and execute it to have your columns collation changed for a different one. In this example I am using SQL_Latin1_General_CP1_CI_AS collation as my wanted collation, I mean I want to have SQL_Latin1_General_CP1_CI_AS  as my new collation. You have to replace it according to your requirement.

select 'ALTER TABLE ' + QUOTENAME(SCHEMA_NAME(tb.schema_id)) + '.' + QUOTENAME(tb.name) + 

' ALTER COLUMN ' + QUOTENAME(c.name) +  ' ' + QUOTENAME(t.name) + '(' + CAST( case when T.NAME='NVARCHAR' THEN  c.max_length/2 

WHEN  T.NAME='NCHAR' THEN  c.max_length/2 ELSE c.max_length  END  AS VARCHAR(10)) +')' 

 +' COLLATE SQL_Latin1_General_CP1_CI_AS' + CASE WHEN c.is_nullable =1 THEN ' NULL ' else ' NOT NULL ;' END--,  c.collation_name,c.is_nullable 

from sys.columns c

inner join sys.types t on t.user_type_id= c.user_type_id

inner join sys.tables tb on  c.object_id=tb.object_id

where c.collation_name is not null 

and t.is_user_defined=0 and tb.is_ms_shipped=0 and tb.name<>'sysdiagrams'

and c.collation_name<>'SQL_Latin1_General_CP1_CI_AS'

order by tb.name, c.column_id


Mind your head because the execution of output code may fail as some columns could have some constraints (for instance some Foreign keys, Primary Keys, Uniques, etc.) or indexes that you may need to drop them first and then recreate them after you change the collation of the column.
Just to finish, I highly recommend testing this code on a copy of your database to check whether any error appears because of reasons explained above, then doing what is necessary. Finally, being totally sure that there is no error, you can proceed with the execution on your database in production environment. Please let me know any remark or question you may have. Thanks for reading!

Sunday, 1 May 2016

AlwaysOn AG Listener: The attempt to create the network name and IP address for the listener failed

While working on a heap of High Availability and Disaster Recovery solutions, I was challenged to deal with some complex errors that I had to overcome as fast as possible. Today’s post is going to show how we may solve one of them. I am speaking about the following error which is raised when we have to set up the AlwaysOn AG Listener:

The Windows Server Failover Clustering (WSFC) resource control API returned error code 5057.  The WSFC service may not be running or may not be accessible in its current state, or the specified arguments are invalid. 
The attempt to create the network name and IP address for the listener failed. The WSFC service may not be running or may be inaccessible in its current state, or the values provided for the network name and IP address may be incorrect. Check the state of the WSFC cluster and validate the network name and IP address with the network administrator. (Microsoft SQL Server, Error: 41009)

At times it can be quite easy to fix it, but it may become complicated as we do not have more details of the root cause, therefore, it does not give us any clue. In my experience working on this, the most common cause has to be about lacking of permission for the cluster name account so we have to make sure that this account has the 'Create Computer' and 'Read' permissions:



Once you have given right permissions and if the error is still there then you must check whether the IP Address is available to be assigned to the AlwaysOn AG Listener. It is simple to verify by making ping to IP Address which should be free. If not, ask your Administrator a new IP Address and try again.
I hope this practical post helps you. Let me know any remarks you may have. Until next post, thanks for reading!
HELLO, I'M PERCY REYES! I've been working as a senior SQL Server Database Engineer for over 20 years; I'm a three-time Microsoft Data Platform MVP. I'm a cryptographer conducting research on cryptographic Boolean functions and their applications.