Reputation: 141
How do i get special characters in column on SqlServer ?
I got e-mail list , and i have to find special caracteres like example bellow
**Email**
Jó[email protected]
Khã[email protected]
As u see above , there's '~' and '´'as special characters . Might be appear others characters like '..' or other else.
Im working on Sql Server 2012 ,
Anyone has suggestion to solve it ?
Upvotes: 0
Views: 1521
Reputation: 69759
To extract the special characters you would first need to split your string into rows, so you can query each individually, which you can do with a numbers table. If you don't have one, they are very easy to create on the fly:
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3)
SELECT Number
FROM Numbers;
This gives a list of numbers from 1-10000. More on this here.
Then you can join this to your data with the condition Number < LEN(Email)
to ensure you get one row back for each character in the email, then use SUBSTRING()
to extract the character at the position n
:
DECLARE @T TABLE (ID INT IDENTITY, Email NVARCHAR(255));
INSERT @T (Email)
VALUES (N'JóhnSnó[email protected]'), (N'Khã[email protected]'), ('[email protected]');
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3)
SELECT t.ID,
t.Email,
Character = SUBSTRING(t.Email, n.Number, 1)
FROM @T AS t
INNER JOIN Numbers n
ON n.Number < LEN(t.Email)
ORDER BY t.ID;
Which gives:
ID Email Character
-----------------------------
1 Jó[email protected] J
1 Jó[email protected] ó
1 Jó[email protected] h
1 Jó[email protected] n
1 Jó[email protected] S
1 Jó[email protected] n
1 Jó[email protected] ó
1 Jó[email protected] w
.....
Then you can extract the special characters by converting them to VARCHAR
with the collation SQL_Latin1_General_Cp1251_CS_AS
, and checking that to the original:
DECLARE @T TABLE (ID INT IDENTITY, Email NVARCHAR(255));
INSERT @T (Email)
VALUES (N'JóhnSnó[email protected]'), (N'Khã[email protected]'), ('[email protected]');
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3),
AllCharacters as
( SELECT t.ID,
t.Email,
Character = SUBSTRING(t.Email, n.Number, 1),
Position = n.Number
FROM @T AS t
INNER JOIN Numbers n
ON n.Number < LEN(t.Email)
)
SELECT ac.ID, ac.Character, ac.Position
FROM AllCharacters AS ac
WHERE CONVERT(CHAR(1), ac.Character) COLLATE SQL_Latin1_General_Cp1251_CS_AS <> ac.Character
ORDER BY ac.ID;
Result
ID Email Character Position
----------------------------------------------------
1 JóhnSnó[email protected] ó 2
1 JóhnSnó[email protected] ó 7
2 Khã[email protected] ã 3
Then finally, if required you can use XML extensions to concatenate these characters into a single column:
DECLARE @T TABLE (ID INT IDENTITY, Email NVARCHAR(255));
INSERT @T (Email)
VALUES (N'JóhnSnó[email protected]'), (N'Khã[email protected]'), ('[email protected]');
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3),
AllCharacters as
( SELECT t.ID,
t.Email,
Character = SUBSTRING(t.Email, n.Number, 1),
Position = n.Number
FROM @T AS t
INNER JOIN Numbers n
ON n.Number < LEN(t.Email)
), SpecialCharacters AS
( SELECT ac.ID, ac.Character, ac.Position
FROM AllCharacters AS ac
WHERE CONVERT(CHAR(1), ac.Character) COLLATE SQL_Latin1_General_Cp1251_CS_AS <> ac.Character
)
SELECT t.ID,
t.Email,
SpecialCharacters = ISNULL(STUFF(s.SpecialCharacterList.value('.', 'NVARCHAR(255)'), 1, 2, ''), '')
FROM @T AS T
CROSS APPLY
( SELECT CONCAT(N', ', s.Character, '(', Position, ')')
FROM SpecialCharacters AS s
WHERE s.ID = t.ID
ORDER BY Position
FOR XML PATH(''), TYPE
) s (SpecialCharacterList)
ORDER BY ID;
Result
ID Email SpecialCharacters
------------------------------------------------
1 JóhnSnó[email protected] ó(2), ó(7)
2 Khã[email protected] ã(3)
3 [email protected]
As an aside, it may be better suited to your needs to store in a table what you count as special characters rather than relying on the code pages for specific collations, if you were to do this, you would just need to change this line:
WHERE CONVERT(CHAR(1), ac.Character) COLLATE SQL_Latin1_General_Cp1251_CS_AS <> ac.Character
For:
WHERE EXISTS (SELECT 1 FROM MySpecialCharacterTable AS sct WHERE sct.Character = ac.Character)
Upvotes: 4