Reputation: 891
On a Linux machine, I am using PDO DBLIB to connect to an MSSQL database and insert data in a SQL_Latin1_General_CP1_CI_AS
table. The problem is that when I am trying to insert chinese characters (multibyte) they are inserted as 哈市香åŠåŒºç 江路å·
.
My (part of) code is as follows:
$DBH = new PDO("dblib:host=$myServer;dbname=$myDB;", $myUser, $myPass);
$query = "
INSERT INTO UserSignUpInfo
(FirstName)
VALUES
(:firstname)";
$STH = $DBH->prepare($query);
$STH->bindParam(':firstname', $firstname);
What I've tried so far:
Doing mb_convert_encoding
to UTF-16LE
on $firstname
and CAST as VARBINARY in the query like:
$firstname = mb_convert_encoding($firstname, 'UTF-16LE', 'UTF-8');
VALUES
(CAST(:firstname AS VARBINARY));
Which results in inserting the characters properly, until there are some not-multibyte characters, which break the PDO execute.
Setting my connection as utf8:
$DBH = new PDO("dblib:host=$myServer;dbname=$myDB;charset=UTF-8;", $myUser, $myPass);
$DBH->exec('SET CHARACTER SET utf8');
$DBH->query("SET NAMES utf8");
Setting client charset
to UTF-8 in my freetds.conf
Which had no impact.
Is there any way at all, to insert multibyte data in that SQL database? Is there any other workaround? I've thought of trying PDO ODBC or even mssql, but thought it's better to ask here before wasting any more time.
Thanks in advance.
EDIT:
I ended up using MSSQL and the N
data type prefix. I will swap for and try PDO_ODBC when I have more time. Thanks everyone for the answers!
Upvotes: 16
Views: 5505
Reputation: 23670
Is there any way at all, to insert multibyte data in [this particular] SQL database? Is there any other workaround?
If you can switch to PDO_ODBC, Microsoft provides free SQL Server ODBC drivers for Linux (only for 64-bit Red Hat Enterprise Linux, and 64-bit SUSE Linux Enterprise) which support Unicode.
If you can change to PDO_ODBC, then the N-prefix for inserting Unicode is going to work.
If you can change the affected table from SQL_Latin1_General_CP1_CI_AS
to UTF-8
(which is the default for MSSQL), then that would be ideal.
Your case is more restricted. This solution is suited for the case when you have mixed multibyte and non-multibyte characters in your input string, and you need to save them to a Latin table, and the N
data type prefix isn't working, and you don't want to change away from PDO DBLIB (because Microsoft's Unicode PDO_ODBC is barely supported on linux). Here is one workaround.
Conditionally encode the input string as base64. After all, that's how we can safely transport pictures in line with emails.
Working Example:
$DBH = new PDO("dblib:host=$myServer;dbname=$myDB;", $myUser, $myPass);
$query = "
INSERT INTO [StackOverflow].[dbo].[UserSignUpInfo]
([FirstName])
VALUES
(:firstname)";
$STH = $DBH->prepare($query);
$firstname = "输入中国文字!Okay!";
/* First, check if this string has any Unicode at all */
if (strlen($firstname) != strlen(utf8_decode($firstname))) {
/* If so, change the string to base64. */
$firstname = base64_encode($firstname);
}
$STH->bindParam(':firstname', $firstname);
$STH->execute();
Then to go backwards, you can test for base64 strings, and decode only them without damaging your existing entries, like so:
while ($row = $STH->fetch()) {
$entry = $row[0];
if (base64_encode(base64_decode($entry , true)) === $entry) {
/* Decoding and re-encoding a true base64 string results in the original entry */
print_r(base64_decode($entry) . PHP_EOL);
} else {
/* Previous entries not encoded will fall through gracefully */
print_r($entry . PHP_EOL);
}
}
Entries will be saved like this:
Guan Tianlang
5pys6Kqe44KS5a2maGVsbG8=
But you can easily convert them back to:
Guan Tianlang
输入中国文字!Okay!
Upvotes: 9
Reputation: 8859
just use nvarchar, ntext, nChar and when you want to insert then use
INSERT INTO UserSignUpInfo
(FirstName)
VALUES
(N'firstname');
N
will refer to Unicode charactor and it is standard world wide.
Ref :
https://technet.microsoft.com/en-us/library/ms191200(v=sql.105).aspx
https://irfansworld.wordpress.com/2011/01/25/what-is-unicode-and-non-unicode-data-formats/
Upvotes: -1
Reputation: 13971
You can use Unicode compatible data-type for the table column for supporting foreign languages(exceptions are shown in EDIT 2).
(char, varchar, text) Versus (nchar, nvarchar, ntext)
Non-Unicode :
Best suited for US English: "One problem with data types that use 1 byte to encode each character is that the data type can only represent 256 different characters. This forces multiple encoding specifications (or code pages) for different alphabets such as European alphabets, which are relatively small. It is also impossible to handle systems such as the Japanese Kanji or Korean Hangul alphabets that have thousands of characters
Unicode
Best suited for systems that need to support at least one foreign language: "The Unicode specification defines a single encoding scheme for most characters widely used in businesses around the world. All computers consistently translate the bit patterns in Unicode data into characters using the single Unicode specification. This ensures that the same bit pattern is always converted to the same character on all computers. Data can be freely transferred from one database or computer to another without concern that the receiving system will translate the bit patterns into characters incorrectly.
Example :
Also i have tried one example you can view its screens below,it would be helpful for issues relating the foreign language insertions as the question is right now.The column as seen below in nvarchar
and it do support the Chinese language
EDIT 1:
Another related issue is discussed here
EDIT 2 :
Unicode unsupported scripts are shown here
Upvotes: 1
Reputation: 3694
Collation shouldn't matter here.
Double-byte characters need to be stored in nvarchar
, nchar
, or ntext
fields. You don't need to perform any casting.
The n
data type prefix stands for National, and it causes SQL Server to store text as Unicode (UTF-16).
Edit:
PDO_DBLIB does not support Unicode, and is now deprecated.
If you can switch to PDO_ODBC, Microsoft provides free SQL Server ODBC drivers for Linux which support Unicode.
Microsoft - SQL Server ODBC Driver Documentation
Blog - Installing and Using the Microsoft SQL Server ODBC Driver for Linux
Upvotes: 2
Reputation: 969
This link Explain of chinese character in MYSQL. Can't insert Chinese character into MySQL .
You have to create table table_name () CHARACTER SET = utf8;
Use UTF-8
when you insert to table
set username utf8; INSERT INTO table_name (ABC,VAL);
abd create Database in CHARACTER SET utf8 COLLATE utf8_general_ci;
then You can insert in chinese character in table
Upvotes: -2