Reputation: 14479
I was surprised that I was unable to find a straightforward answer to this question by searching.
I have a web application in PHP that takes user input. Due to the nature of the application, users may often use extended ASCII characters (a.k.a. "ALT codes").
My specific issue at the moment is with ALT code 26, which is a right arrow (→). This will be accompanied with other text to be stored in the same field (for example, 'this→that'
).
My column type is NVARCHAR.
Here's what I've tried:
I've tried doing no conversions and just inserting the value as normal, but the value gets stored as thisâ??that
.
I've tried converting the value to UCS-2 in PHP using iconv('UTF-8', 'UCS-2', $value)
, but I get an error saying Unclosed quotation mark after the character string 't'.
. The query ends up looking like this: UPDATE myTable SET myColumn = 'this�!that'
.
I've tried doing the above conversion and then adding an N before the quoted value, but I get the same error message. The query looks like this: UPDATE myTable SET myColumn = N'this�!that'
.
I've tried removing the UCS-2 conversion and just adding the N before the quoted value, and the query works again, but the value is stored as thisâ that
.
I've tried using utf8_decode($value)
in PHP, but then the arrow is just replaced with a question mark.
So can anyone answer the (seemingly simple) question of, how can I store this value in my database and then retrieve it as it was originally typed?
I'm using PHP 5.5 and MSSQL 2012. If any question of driver/OS version comes into play, it's a Linux server connecting via FreeTDS. There is no possibility of changing this.
Upvotes: 7
Views: 3216
Reputation: 37318
The accepted answer seems to do the job; yes you can encode it to base64
and then decode it back again, but then all the applications that use that remote database, should change and support the fields to be base64
encoded. My thought is that if there is a remote MS SQL Server database, there could be an other application (or applications) that may use it, so that application have to also be changed to support both plain and base64
encoding. And you'll have to also handle both plain text and base64
converted text.
I searched a little bit and I found how to send UNICODE text to the MS SQL Server using MS SQL commands and PHP to convert the UNICODE bytes to HEX numbers.
If you go at the PHP documentation for the mssql_fetch_array
(http://php.net/manual/ru/function.mssql-fetch-array.php#80076), you'll see at the comments a pretty good solution that converts the text to UNICODE HEX values and then sends that HEX data directly to MS SQL Server like this:
Convert Unicode Text to HEX Data
// sending data to database
$utf8 = 'Δοκιμή με unicode → Test with Unicode'; // some Greek text for example
$ucs2 = iconv('UTF-8', 'UCS-2LE', $utf8);
// converting UCS-2 string into "binary" hexadecimal form
$arr = unpack('H*hex', $ucs2);
$hex = "0x{$arr['hex']}";
// IMPORTANT!
// please note that value must be passed without apostrophes
// it should be "... values(0x0123456789ABCEF) ...", not "... values('0x0123456789ABCEF') ..."
mssql_query("INSERT INTO mytable (myfield) VALUES ({$hex})", $link);
Now all the text actually is stored to the NVARCHAR
database field correctly as UNICODE, and that's all you have to do in order to send and store it as plain text and not encoded.
To retrieve that text, you need to ask MS SQL Server to send back UNICODE encoded text like this:
Retrieving Unicode Text from MS SQL Server
// retrieving data from database
// IMPORTANT!
// please note that "varbinary" expects number of bytes
// in this example it must be 200 (bytes), while size of field is 100 (UCS-2 chars)
// myfield is of 50 length, so I set VARBINARY to 100
$result = mssql_query("SELECT CONVERT(VARBINARY(100), myfield) AS myfield FROM mytable", $link);
while (($row = mssql_fetch_array($result, MSSQL_BOTH)))
{
// we get data in UCS-2
// I use UTF-8 in my project, so I encode it back
echo '1. '.iconv('UCS-2LE', 'UTF-8', $row['myfield'])).PHP_EOL;
// or you can even use mb_convert_encoding to convert from UCS-2LE to UTF-8
echo '2. '.mb_convert_encoding($row['myfield'], 'UTF-8', 'UCS-2LE').PHP_EOL;
}
The MS SQL Table with the UNICODE Data after the INSERT
The output result using a PHP page to display the values
I'm not sure if you can reach my test page here, but you can try to see the live results: http://dbg.deve.wiznet.gr/php56/mssql/test1.php
Upvotes: 1
Reputation: 2043
You might try base64 encoding the input, this is fairly trivial to handle with PHP's base64_encode()
and base64_decode()
and it should handle what ever your users throw at it.
(edit: You can apparently also do the base64 encoding on the SQL Server side. This doesn't seem like something it should be responsible for imho, but it's an option.)
Upvotes: 5
Reputation: 17024
It seems like your freetds.conf
is wrong. You need a TDS protocol version >= 7.0 to support unicode. See this for more details.
Edit your freetds.conf
:
[global]
# TDS protocol version
tds version = 7.4
client charset = UTF-8
Also make sure to configure PHP correct:
ini_set('mssql.charset', 'UTF-8');
Upvotes: 1