Gudradain
Gudradain

Reputation: 4753

How to use PHP PDO ODBC with SQL Server and Unicode Characters?

PHP PDO ODBC doesn't seem to be able to store NVARCHAR characters (UTF-16) without some manual encoding. This seems like a pretty commun "bug" all around the internet and no one seems to have a definitive solution.

How to reproduce the bug

  1. Try to insert the following japanese characters using PDO : こんにちは (It means hello)
  2. The following will be stored in your database : ã“ã‚“ã«ã¡ã¯
  3. Then retrieve it via PDO and print it on the screen, you will get : こんにちは

It's not so bad but it's not good either. PHP works but when you have other applications that are not in PHP that access that information from your database they will get the wrong string : ã“ã‚“ã«ã¡ã¯.

Ideally you would want to have こんにちは everywhere.

The symptom

It seems that PDO doesn't have any notion of NVARCHAR, aka characters that are encoded using 16 bits. In fact, everything that you pass to or retrieve from SQL SERVER via PDO will be by chunk of 8 bits. How to "prove" it? Here it is :

  1. You start with your japanese string こんにちは.

You first have to know that PHP consider string as binary and that (if it's set like that) it will store them in UTF-8.

So if we look at the binary representation of こんにちは, you will get E38193E38293E381ABE381A1E381AF which is also the binary representation that SQL SERVER will give you for ã“ã‚“ã«ã¡ã¯. (depends on your collation)

  1. Next, let's put it into UTF-16 because that's the format of NVARCHAR.

    $utf16_string = mb_convert_encoding('こんにちは', 'UTF-16LE');

The following will change the binary representation in PHP of こんにちは to 533093306B3061306F30 which is exactly the binary representation of こんにちは in SQL SERVER NVARCHAR.

  1. Next try to save it in SQL via PDO and you will get the following : S0“0k0a0o0

The SQL SERVER binary representation of S0“0k0a0o0 in VARCHAR is 533093306B3061306F30 which is also the binary representation of こんにちは in NVARCHAR.

A dirty solution

You can use the following to save and retrieve unicode data in SQL SERVER via PDO ODBC but it's ugly...

  1. You want to transform the data into exactly the same binary representation that SQL SERVER NVARCHAR will store it

    mb_convert_encoding('こんにちは', 'UTF-16LE');

  2. You want to receive it as a binary on SQL SERVER side and then transform it to NVARCHAR.

    @binary VARBINARY(40) SELECT @string = CONVERT(NVARCHAR(20), @binary);

  3. At this point you have こんにちは in your database. To retrieve it you want to resend it to PHP as a binary

  4. Once you get the binary in PHP, PHP will already have transform it into a hex string... So, you want to convert the hex string into a binary and then change the encoding from utf-16 to utf-8

    $result = mb_convert_encoding(hex2bin($string), 'UTF-8', 'UTF-16LE');

And you will be back with your こんにちは when you echo it to your webpage.

Basically, that's what the SQL driver should be doing for me instead of me doing it manually.

Did I forgot to configure something or I have to do it manually?

Upvotes: 5

Views: 3167

Answers (2)

Thorkil Johansen
Thorkil Johansen

Reputation: 49

I have been struggling all day, trying to figure out what is possible with ODBC/PHP 👍

My server is Windows 2012R2 server without any office programs installed. Only the MS 64bit Access driver (AccessDatabaseEngine_X64.exe file version 16.0.4519.1000)

ODBC Driver is 16.00.4513.1000 (ACEODBC.DLL from 07-03-2017)

My server is running PHP Ver 7.4 and Apache 2.4

I have added this to my PHP.INI: extension=php_com_dotnet.dll

I have tried both PHP/ODBC Classic and PHP/ODBC/PDO - Not successfull

For testing I always use the Danish island Ærø and the Polish city Łódź

I create the Excel and Access files on my laptop (Office 2016) and copy the files to the server. As said before... No Office installed on my server!

The AccessDB on my private laptop

I have tried the ADODB.Connection as well. I will not produce the right output.

<?PHP
$file_location = "C:\TestData\AccessPHP.accdb";
$connStr = "Driver={Microsoft Access Driver (*.mdb, *.accdb)};charset=UTF-8;Dbq=$file_location;";
$con = new COM("ADODB.Connection", NULL, CP_UTF8);  // specify UTF-8 code page
$con->Open($connStr);
$rst = new COM("ADODB.Recordset");
$sql = "SELECT * FROM demo1";
$rst->Open($sql, $con, 3, 3);  // adOpenStatic, adLockOptimistic
echo 'Łódź ' . bin2hex('Łódź') . ' Length: ' . mb_strlen('Łódź') . '<br>';
while (!$rst->EOF) {
    $p = $rst->Fields("Place");
    echo 'H: ' . bin2hex($p) . ' P: ' . $p . ' - P:(1252) ' . utf8_encode($p) . ': ' . $rst->Fields(2) . '<br>';
    $rst->MoveNext;
}
$rst->Close();
$con->Close();

Primary inspiration was from here: Unable to retrieve UTF-8 accented characters from Access via PDO_ODBC

The "problem" in this solution is that is uses the old .MDB file format. which I think one should try to avoid. Let's aim for newest drivers. (I'm hoping that @Gord Thompson will comment on this.)

So... The big question is this: Who has a problem ? Microsoft or PHP ?

I turned to PowerScript. It looks like this:

$query='select * from demo1' 
$conn = New-Object System.Data.Odbc.OdbcConnection
$conn.ConnectionString = "DSN=DemoDSN"
$conn.open()
$cmd = New-object System.Data.Odbc.OdbcCommand($query,$conn)
$ds = New-Object system.Data.DataSet
$numrows = (New-Object system.Data.odbc.odbcDataAdapter($cmd)).fill($ds) 
$conn.close()
foreach ($row in $ds.Tables[0].Rows ) {
    write-host "$($row[0]) `t $($row.Place)"
}
Write-host "Number of rows returned: $numrows"
$ds.tables

The result is as expected:

1    London
2    Ærø
3    Łódź
Number of rows returned: 3
Key Place  RowNameWithæøå
--- -----  --------------
  1 London          1,456
  2 Ærø                 8
  3 Łódź         -12,3456

OK. That indicates that PHP is the one with the problem.

I installed a trial version of this http://querytool.com It's a very nice piece of software that uses the MS 64bit drivers.

It works like a charm. And does both Excel and Access 100% ok

(Personally I'm using the free version of DBeaver, so I will save the 180$)

OK... The 'Advanced Query Tool' also proved that it is possible to get the MS Drivers to deliver the double-byte UTF-8 polish characters through ODBC.

It is so sad to realize, that the mighty PHP 7.4 is unable to treat ODBC correct :-(

As a last bonus info I can tell you, that when using the Excel driver (which - by the way - is the same DLL) then you need to know, that if the WorkBook-file contains a sheet called Sheet1, then you have to name it [Sheet1$] in your query: select * from [Sheet1$]

Well, this was not a 100% solution, but maybe it will inspire someone, to figure out how this PHP issue can be solved :-)

Upvotes: 0

Gord Thompson
Gord Thompson

Reputation: 123474

No, I don't think you forgot to configure anything. In fact, your explanation is the best I've found so far regarding the long-standing "issues" between PHP and Microsoft ODBC drivers. Those issues are especially puzzling given that the PDO_ODBC page says:

On Windows, PDO_ODBC ... is the recommended driver for connecting to Microsoft SQL Server databases.

However, on Windows they also offer PDO_SQLSRV which actually does appear to work correctly.

So it seems that PDO_ODBC "doesn't have any notion of NVARCHAR", rather than PDO as a whole.

(Similar problems arise when trying to use PHP with Microsoft Access ODBC if Unicode characters are involved)

Conclusion: PHP support for ODBC continues to be a bit of a mess, at least where Microsoft databases are concerned.

Upvotes: 2

Related Questions