Reputation: 4753
PHP PDO ODBC doesn't seem to be able to store NVARCHAR characters (UTF-16) without some manual encoding. This seems like a pretty commun "bug" all around the internet and no one seems to have a definitive solution.
How to reproduce the bug
こんにちは
(It means hello)ã“ã‚“ã«ã¡ã¯
こんにちは
It's not so bad but it's not good either. PHP works but when you have other applications that are not in PHP that access that information from your database they will get the wrong string : ã“ã‚“ã«ã¡ã¯
.
Ideally you would want to have こんにちは
everywhere.
The symptom
It seems that PDO doesn't have any notion of NVARCHAR, aka characters that are encoded using 16 bits. In fact, everything that you pass to or retrieve from SQL SERVER via PDO will be by chunk of 8 bits. How to "prove" it? Here it is :
こんにちは
.You first have to know that PHP consider string as binary and that (if it's set like that) it will store them in UTF-8.
So if we look at the binary representation of こんにちは
, you will get E38193E38293E381ABE381A1E381AF
which is also the binary representation that SQL SERVER will give you for ã“ã‚“ã«ã¡ã¯
. (depends on your collation)
Next, let's put it into UTF-16 because that's the format of NVARCHAR.
$utf16_string = mb_convert_encoding('こんにちは', 'UTF-16LE');
The following will change the binary representation in PHP of こんにちは
to 533093306B3061306F30
which is exactly the binary representation of こんにちは
in SQL SERVER NVARCHAR.
S0“0k0a0o0
The SQL SERVER binary representation of S0“0k0a0o0
in VARCHAR is 533093306B3061306F30
which is also the binary representation of こんにちは
in NVARCHAR.
A dirty solution
You can use the following to save and retrieve unicode data in SQL SERVER via PDO ODBC but it's ugly...
You want to transform the data into exactly the same binary representation that SQL SERVER NVARCHAR will store it
mb_convert_encoding('こんにちは', 'UTF-16LE');
You want to receive it as a binary on SQL SERVER side and then transform it to NVARCHAR.
@binary VARBINARY(40) SELECT @string = CONVERT(NVARCHAR(20), @binary);
At this point you have こんにちは
in your database. To retrieve it you want to resend it to PHP as a binary
Once you get the binary in PHP, PHP will already have transform it into a hex string... So, you want to convert the hex string into a binary and then change the encoding from utf-16 to utf-8
$result = mb_convert_encoding(hex2bin($string), 'UTF-8', 'UTF-16LE');
And you will be back with your こんにちは
when you echo it to your webpage.
Basically, that's what the SQL driver should be doing for me instead of me doing it manually.
Did I forgot to configure something or I have to do it manually?
Upvotes: 5
Views: 3167
Reputation: 49
I have been struggling all day, trying to figure out what is possible with ODBC/PHP 👍
My server is Windows 2012R2 server without any office programs installed. Only the MS 64bit Access driver (AccessDatabaseEngine_X64.exe file version 16.0.4519.1000)
ODBC Driver is 16.00.4513.1000 (ACEODBC.DLL from 07-03-2017)
My server is running PHP Ver 7.4 and Apache 2.4
I have added this to my PHP.INI: extension=php_com_dotnet.dll
I have tried both PHP/ODBC Classic and PHP/ODBC/PDO - Not successfull
For testing I always use the Danish island Ærø
and the Polish city Łódź
I create the Excel and Access files on my laptop (Office 2016) and copy the files to the server. As said before... No Office installed on my server!
I have tried the ADODB.Connection as well. I will not produce the right output.
<?PHP
$file_location = "C:\TestData\AccessPHP.accdb";
$connStr = "Driver={Microsoft Access Driver (*.mdb, *.accdb)};charset=UTF-8;Dbq=$file_location;";
$con = new COM("ADODB.Connection", NULL, CP_UTF8); // specify UTF-8 code page
$con->Open($connStr);
$rst = new COM("ADODB.Recordset");
$sql = "SELECT * FROM demo1";
$rst->Open($sql, $con, 3, 3); // adOpenStatic, adLockOptimistic
echo 'Łódź ' . bin2hex('Łódź') . ' Length: ' . mb_strlen('Łódź') . '<br>';
while (!$rst->EOF) {
$p = $rst->Fields("Place");
echo 'H: ' . bin2hex($p) . ' P: ' . $p . ' - P:(1252) ' . utf8_encode($p) . ': ' . $rst->Fields(2) . '<br>';
$rst->MoveNext;
}
$rst->Close();
$con->Close();
Primary inspiration was from here: Unable to retrieve UTF-8 accented characters from Access via PDO_ODBC
The "problem" in this solution is that is uses the old .MDB file format. which I think one should try to avoid. Let's aim for newest drivers. (I'm hoping that @Gord Thompson will comment on this.)
So... The big question is this: Who has a problem ? Microsoft or PHP ?
I turned to PowerScript. It looks like this:
$query='select * from demo1'
$conn = New-Object System.Data.Odbc.OdbcConnection
$conn.ConnectionString = "DSN=DemoDSN"
$conn.open()
$cmd = New-object System.Data.Odbc.OdbcCommand($query,$conn)
$ds = New-Object system.Data.DataSet
$numrows = (New-Object system.Data.odbc.odbcDataAdapter($cmd)).fill($ds)
$conn.close()
foreach ($row in $ds.Tables[0].Rows ) {
write-host "$($row[0]) `t $($row.Place)"
}
Write-host "Number of rows returned: $numrows"
$ds.tables
The result is as expected:
1 London
2 Ærø
3 Łódź
Number of rows returned: 3
Key Place RowNameWithæøå
--- ----- --------------
1 London 1,456
2 Ærø 8
3 Łódź -12,3456
OK. That indicates that PHP is the one with the problem.
I installed a trial version of this http://querytool.com It's a very nice piece of software that uses the MS 64bit drivers.
It works like a charm. And does both Excel and Access 100% ok
(Personally I'm using the free version of DBeaver, so I will save the 180$)
OK... The 'Advanced Query Tool' also proved that it is possible to get the MS Drivers to deliver the double-byte UTF-8 polish characters through ODBC.
It is so sad to realize, that the mighty PHP 7.4 is unable to treat ODBC correct :-(
As a last bonus info I can tell you, that when using the Excel driver (which - by the way - is the same DLL) then you need to know, that if the WorkBook-file contains a sheet called Sheet1, then you have to name it [Sheet1$]
in your query: select * from [Sheet1$]
Well, this was not a 100% solution, but maybe it will inspire someone, to figure out how this PHP issue can be solved :-)
Upvotes: 0
Reputation: 123474
No, I don't think you forgot to configure anything. In fact, your explanation is the best I've found so far regarding the long-standing "issues" between PHP and Microsoft ODBC drivers. Those issues are especially puzzling given that the PDO_ODBC page says:
On Windows, PDO_ODBC ... is the recommended driver for connecting to Microsoft SQL Server databases.
However, on Windows they also offer PDO_SQLSRV which actually does appear to work correctly.
So it seems that PDO_ODBC "doesn't have any notion of NVARCHAR", rather than PDO as a whole.
(Similar problems arise when trying to use PHP with Microsoft Access ODBC if Unicode characters are involved)
Conclusion: PHP support for ODBC continues to be a bit of a mess, at least where Microsoft databases are concerned.
Upvotes: 2