Wednesday, 5 April 2017

Looking into Modern_Spanish and Latin1_General

The collations Modern_Spanish and Latin1_General are both Windows collations and support the same character set (Code Page 1252), but there are some differences between both collations related to the treatment of some characters in terms of sorting/doing comparisons. Some of the differences are noticeable especially when Accent-Insensitive is used, in Modern_Spanish n and ñ are considered different characters whereas in Latin1_General they are considered a character and an accented version of the same character.

Today I am going to show off an example to make it clear so that you can take it on board to pre-empt unforeseen results or conflict of collations. This example is based on Modern_Spanish_CI_AI and Latin1_General_CI_AI.

create table TempTable (
   Text_in_Latin1  varchar(100) collate Latin1_General_CI_AI,
   Text_in_Modern varchar(100) collate Modern_Spanish_CI_AI
)
go
insert into TempTable values ('Español', 'Español')
insert into TempTable values ('Espanol', 'Espanol')
insert into TempTable values ('Espanól', 'Espanól')
go
select Text_in_Latin1 from TempTable where Text_in_Latin1='Espanol'
go
select Text_in_Modern from TempTable where Text_in_Modern='Espanol'
go
drop table TempTable

Text_in_Latin1
--------------------------------------
Español
Espanol
Espanól

Text_in_Modern
--------------------------------------
Espanol
Espanól

As you can see, the first result demonstrates that Español, Espanol and Espanól are treated as the same word when we use Accent-Insensitive whereas in the second one Español and Espanol are treated as different words. It is worth noting that you only must change the collation if you really understand the behaviour change. For example, one might think that Latin1_General_CI_AS (Windows Collation) and SQL_Latin1_General_CP1_CI_AS (SQL Collation) are the same, but they are not the same, there is a slight difference in the short behaviour between both for specific special characters like '-'. Here another example about it.

create table TempTable (
   Text_in_Latin1  varchar(100) collate Latin1_General_CI_AS,
   Text_in_SQL_Latin1 varchar(100) collate SQL_Latin1_General_CP1_CI_AS
)
go
insert into TempTable values ('es-pe', 'es-pe')
insert into TempTable values ('espe', 'espe')
go
select Text_in_Latin1 from TempTable order by Text_in_Latin1
go
select Text_in_SQL_Latin1 from TempTable order by Text_in_SQL_Latin1
go
drop table TempTable

Text_in_Latin1
---------------------------------------
espe
es-pe

Text_in_SQL_Latin1
---------------------------------------
es-pe
espe

To be honest, I personally like having the same collation for all databases, but it has to be done carefully and, more importantly, if you really want that change. That is all for now, let me know any remarks you may have.

No comments:

Post a Comment

Let me know any remarks or questions you may have. Please write down your name.

HELLO, I'M PERCY REYES! I've been working as a senior SQL Server Database Engineer for over 20 years; I'm a three-time Microsoft Data Platform MVP. I'm a cryptographer conducting research on cryptographic Boolean functions and their applications.