Manolis
Manolis

Reputation: 891

PDO DBLIB multibyte (chinese) character encoding - SQL server

On a Linux machine, I am using PDO DBLIB to connect to an MSSQL database and insert data in a SQL_Latin1_General_CP1_CI_AS table. The problem is that when I am trying to insert chinese characters (multibyte) they are inserted as 哈市香åŠåŒºç æ±Ÿè·¯å·.

My (part of) code is as follows:

$DBH = new PDO("dblib:host=$myServer;dbname=$myDB;", $myUser, $myPass);

$query = "
    INSERT INTO UserSignUpInfo
    (FirstName)
    VALUES
    (:firstname)";

$STH = $DBH->prepare($query);

$STH->bindParam(':firstname', $firstname);

What I've tried so far:

  1. Doing mb_convert_encoding to UTF-16LE on $firstname and CAST as VARBINARY in the query like:

    $firstname = mb_convert_encoding($firstname, 'UTF-16LE', 'UTF-8');

    VALUES
    (CAST(:firstname AS VARBINARY));
    

    Which results in inserting the characters properly, until there are some not-multibyte characters, which break the PDO execute.

  2. Setting my connection as utf8:

    $DBH = new PDO("dblib:host=$myServer;dbname=$myDB;charset=UTF-8;", $myUser, $myPass);
    $DBH->exec('SET CHARACTER SET utf8');
    $DBH->query("SET NAMES utf8");
    
  3. Setting client charset to UTF-8 in my freetds.conf

    Which had no impact.

Is there any way at all, to insert multibyte data in that SQL database? Is there any other workaround? I've thought of trying PDO ODBC or even mssql, but thought it's better to ask here before wasting any more time.

Thanks in advance.

EDIT:

I ended up using MSSQL and the N data type prefix. I will swap for and try PDO_ODBC when I have more time. Thanks everyone for the answers!

Upvotes: 16

Views: 5505

Answers (5)

Drakes
Drakes

Reputation: 23670

Is there any way at all, to insert multibyte data in [this particular] SQL database? Is there any other workaround?

  1. If you can switch to PDO_ODBC, Microsoft provides free SQL Server ODBC drivers for Linux (only for 64-bit Red Hat Enterprise Linux, and 64-bit SUSE Linux Enterprise) which support Unicode.

  2. If you can change to PDO_ODBC, then the N-prefix for inserting Unicode is going to work.

  3. If you can change the affected table from SQL_Latin1_General_CP1_CI_AS to UTF-8 (which is the default for MSSQL), then that would be ideal.

Your case is more restricted. This solution is suited for the case when you have mixed multibyte and non-multibyte characters in your input string, and you need to save them to a Latin table, and the N data type prefix isn't working, and you don't want to change away from PDO DBLIB (because Microsoft's Unicode PDO_ODBC is barely supported on linux). Here is one workaround.

Conditionally encode the input string as base64. After all, that's how we can safely transport pictures in line with emails.

Working Example:

$DBH = new PDO("dblib:host=$myServer;dbname=$myDB;", $myUser, $myPass);

$query = "
INSERT INTO [StackOverflow].[dbo].[UserSignUpInfo]
           ([FirstName])
     VALUES
           (:firstname)";

$STH = $DBH->prepare($query);

$firstname = "输入中国文字!Okay!";

/* First, check if this string has any Unicode at all */
if (strlen($firstname) != strlen(utf8_decode($firstname))) {
    /* If so, change the string to base64. */
    $firstname = base64_encode($firstname);
}

$STH->bindParam(':firstname', $firstname);
$STH->execute(); 

Then to go backwards, you can test for base64 strings, and decode only them without damaging your existing entries, like so:

while ($row = $STH->fetch()) {
    $entry = $row[0];

    if (base64_encode(base64_decode($entry , true)) === $entry) {

         /* Decoding and re-encoding a true base64 string results in the original entry */
         print_r(base64_decode($entry) . PHP_EOL);

    } else {

         /* Previous entries not encoded will fall through gracefully */
         print_r($entry  . PHP_EOL);
    }
}

Entries will be saved like this:

Guan Tianlang
5pys6Kqe44KS5a2maGVsbG8=

But you can easily convert them back to:

Guan Tianlang
输入中国文字!Okay!

Upvotes: 9

I A Khan
I A Khan

Reputation: 8859

just use nvarchar, ntext, nChar and when you want to insert then use

INSERT INTO UserSignUpInfo
    (FirstName)
    VALUES
    (N'firstname');

N will refer to Unicode charactor and it is standard world wide.

Ref :

https://aalamrangi.wordpress.com/2012/05/13/storing-and-retrieving-non-english-unicode-characters-hindi-czech-arabic-etc-in-sql-server/

https://technet.microsoft.com/en-us/library/ms191200(v=sql.105).aspx

https://irfansworld.wordpress.com/2011/01/25/what-is-unicode-and-non-unicode-data-formats/

Upvotes: -1

Tharif
Tharif

Reputation: 13971

You can use Unicode compatible data-type for the table column for supporting foreign languages(exceptions are shown in EDIT 2).

(char, varchar, text) Versus (nchar, nvarchar, ntext)

Non-Unicode :

Best suited for US English: "One problem with data types that use 1 byte to encode each character is that the data type can only represent 256 different characters. This forces multiple encoding specifications (or code pages) for different alphabets such as European alphabets, which are relatively small. It is also impossible to handle systems such as the Japanese Kanji or Korean Hangul alphabets that have thousands of characters

Unicode

Best suited for systems that need to support at least one foreign language: "The Unicode specification defines a single encoding scheme for most characters widely used in businesses around the world. All computers consistently translate the bit patterns in Unicode data into characters using the single Unicode specification. This ensures that the same bit pattern is always converted to the same character on all computers. Data can be freely transferred from one database or computer to another without concern that the receiving system will translate the bit patterns into characters incorrectly.

Example :

Also i have tried one example you can view its screens below,it would be helpful for issues relating the foreign language insertions as the question is right now.The column as seen below in nvarchar and it do support the Chinese language

enter image description here

EDIT 1:

Another related issue is discussed here

EDIT 2 :

Unicode unsupported scripts are shown here

Upvotes: 1

Jon Tirjan
Jon Tirjan

Reputation: 3694

Collation shouldn't matter here.

Double-byte characters need to be stored in nvarchar, nchar, or ntext fields. You don't need to perform any casting.

The n data type prefix stands for National, and it causes SQL Server to store text as Unicode (UTF-16).

Edit:

PDO_DBLIB does not support Unicode, and is now deprecated.

If you can switch to PDO_ODBC, Microsoft provides free SQL Server ODBC drivers for Linux which support Unicode.

Microsoft - SQL Server ODBC Driver Documentation

Blog - Installing and Using the Microsoft SQL Server ODBC Driver for Linux

Upvotes: 2

ABIRAMAN
ABIRAMAN

Reputation: 969

This link Explain of chinese character in MYSQL. Can't insert Chinese character into MySQL . You have to create table table_name () CHARACTER SET = utf8; Use UTF-8 when you insert to table

set username utf8; INSERT INTO table_name (ABC,VAL); 

abd create Database in CHARACTER SET utf8 COLLATE utf8_general_ci;

then You can insert in chinese character in table

Upvotes: -2

Related Questions