Reputation: 656
I'm using both the following methods to encode in base 64 a Chinese string. Problem is that I'm having Pz8=
as output, which decoded is ??
What's wrong with this and how can I fix it?
Method 1
CREATE FUNCTION [dbo].[base64Encode] (@input VARCHAR(MAX))
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE @output NVARCHAR(MAX),
@bits VARBINARY(3),
@pos INT
SET @pos = 1
SET @output = ''
WHILE @pos <= LEN(@input)
BEGIN
SET @bits = CONVERT(VARBINARY(3), SUBSTRING(@input, @pos, 3))
SET @output = @output + SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/', SUBSTRING(@bits, 1, 1) / 4 + 1, 1)
SET @output = @output + SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/', SUBSTRING(@bits, 1, 1) % 4 * 16 + SUBSTRING(@bits, 2, 1) / 16 + 1, 1)
SET @output = @output + SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/', SUBSTRING(@bits, 2, 1) % 16 * 4 + SUBSTRING(@bits, 3, 1) / 64 + 1, 1)
SET @output = @output + SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/', SUBSTRING(@bits, 3, 1) % 64 + 1, 1)
SET @pos = @pos + 3
END
RETURN (LEFT(@output, LEN(@output) - 3 + LEN(@bits)) + REPLICATE('=', 3 - LEN(@bits)))
END
SELECT [dbo].[base64Encode]('你好')
Method 2
SELECT CAST('你好' as varbinary(max)) FOR XML PATH(''), BINARY BASE64
Upvotes: 0
Views: 1629
Reputation: 67311
You are missing the N
to mark a string literal as unicode:
SELECT N'你好' AS unicode
,'你好' AS ASCII
Try this to get a base64
out of your chinese charcters and vice versa:
SELECT (SELECT CAST(N'你好' AS VARBINARY(MAX)) FOR XML PATH(''),TYPE).value(N'.','nvarchar(max)');
You get this base64
result: YE99WQ==
This is the way to re-convert the base64
to the original value
SELECT CAST(CAST('<x>' + 'YE99WQ==' + '</x>' AS XML).value('.','varbinary(max)') AS NVARCHAR(MAX));
base64
does not encode a string value, but the binary pattern a system uses to keep that string in memory (this is valid for any data type actually). The bit pattern of a string differs with UTF-8
, UTF-16
, ASCII
whatever... And even worse there is BE and LE.
The steps to get base64
are:
base64
for this bit patternThe steps for the re-encoding are
base64
The very last step might bring up confusion... You have to know exactly which binary representation a system uses. You have to use exactly the same data type with exactly the same interpretation to get the values back.
With strings one has to know, that SQL-Server works with a very limited choice natively.
NVARCHAR
(NCHAR
), which is 2-byte encoded unicode in UCS-2
flavour (almost the same as utf-16
) VARCHAR
(CHAR
), which is 1-byte encoded extended ASCII. All non-latin characters are bound to a code page within the connected collation. But this is not UTF-8!Upvotes: 2