Reputation: 1283
Column [Plate_No]
in [my_Table]
contains vehicle plate numbers for Greek and foreign vehicles. Generally Greek plate numbers contain only Greek characters, and foreign plates contain only Latin characters.
Now, some rows contain plate numbers with mixed characters and is incorrect.
for example:
DECLARE @my_Plates AS TABLE (
Plate_No NVARCHAR(50)
)
INSERT INTO @my_Plates (
Plate_No
)
SELECT 'AAA1234'
UNION ALL
SELECT 'ΑΑΑ1234'
UNION ALL
SELECT 'AΑA1234'
SELECT Plate_No, LOWER(Plate_No) AS LC_Plate_No
FROM @my_Plates
result:
Plate_No | LC_Plate_No
---------------------
AAA1234 | aaa1234 <-- correct
ΑΑΑ1234 | ααα1234 <-- correct
AΑA1234 | aαa1234 <-- incorrect
As you can see plate numbers are visually the same, but when LOWER applied, the difference is obvious.
How can those incorrect rows be detected ?
Any help would be appreciated.
Upvotes: 1
Views: 1202
Reputation: 1283
here's a workaround using LIKE
. It's heavy for 10.000.000+ rows, but it's OK since it's only a one time check for correction.
SELECT
Plate_No
, LOWER(Plate_No) AS LC_Plate_No
, (
CASE WHEN Plate_No LIKE '%[α-ω]%' AND Plate_No LIKE '%[a-z]%'
THEN '0'
ELSE '1'
END
) AS isCorrect
FROM @my_Plates
Upvotes: 0
Reputation: 21505
Here's one method which works when the column collation is Greek_CI_AS
:
DECLARE @my_Plates AS TABLE (
Plate_No NVARCHAR(50) COLLATE Greek_CI_AS
)
INSERT INTO @my_Plates (
Plate_No
)
SELECT UPPER(N'aaa1234')
UNION ALL
SELECT UPPER(N'ααα1234')
UNION ALL
SELECT UPPER(N'aαa1234')
SELECT Plate_No, LOWER(Plate_No)
FROM @my_Plates
WHERE Plate_No <> CAST(Plate_No AS varchar(50)) COLLATE Latin1_General_CI_AS
AND Plate_No LIKE '%[A-Z]%'
this probably won't perform well on large data sets - neither of the WHERE
conditions will be able to use an index on Plate_No
.
Upvotes: 2