Matt Fellows
Matt Fellows

Reputation: 6532

PHP, PDO, UTF8 and MySQL not playing ball

I'm really struggling trying to get PHP to play ball with MySQL through PDO with regards to UTF-8 characters.

I get the following using php --version:

PHP 5.3.6 (cli) (built: Apr 19 2011 13:21:12)
Copyright (c) 1997-2011 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2011 Zend Technologies

And this is the welcome message from mysql cli:

Server version: 5.1.57 Source distribution

I have this as my database connection code:

try {
    $dbh = new PDO('mysql:host='.$server.';dbname='.$database.';charset=UTF8', $user, $password, array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'UTF8'"));
} catch (PDOException $e) {
    print "Error!: " . $e->getMessage() . "<br/>";
    die();
}

And my php code to insert into MySQL:

$dbh->query("SET NAMES 'UTF8'"); 
$values = array(':idCode' => $idCode, ':name' => $name, ':address' => $address, ':postcode' => $postcode, ':tel' => $tel, ':website' => $website, ':disabled' => $disabled, ':disabledWC' => $disabledWC, ':description' => $description, ':location' => $location, ':type' => $type, ':status' => $status, ':image1name' => $image1name, ':image2name' => $image2name, ':image3name' => $image3name, ':image4name' => $image4name, ':image5name' => $image5name);
$preparedStatement = $dbh->prepare('UPDATE venues SET name = :name, address = :address, postcode = :postcode, tel = :tel, website = :website, disabled = :disabled, disabledWC = :disabledWC, description = :description, location = :location, type = :type, status = :status, date_modified = NOW(), image1name = :image1name, image2name = :image2name, image3name = :image3name, image4name = :image4name, image5name = :image5name WHERE idCode = :idCode');
$preparedStatement->execute($values);

The problem I'm getting is that even though I've verified that $description does indeed contain UTF8 encoded bytes (Using bin2hex and a hex2UTF8 tool on the web) the data does not get submitted through the connection as UTF-8 and I end up with "Générik" instead of "Générik" in the database.

I have tried putting in a call to "SET NAMES UTF8" prior to executing, I've put in:

[client]
default-character-set=utf8

[mysqld]
default-character-set=utf8

in my my.cnf but can't seem to get any joy.

Any hints or tips on how to get this to work?

I should add - if I enter the text manually in the command line MySQL client it works fine, the column is set to take UTF8 encoding.

As requested - Create Table Script:

| venues | CREATE TABLE `venues` (
  `idCode` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(512) DEFAULT NULL,
  `address` varchar(1024) DEFAULT NULL,
  `postcode` varchar(32) DEFAULT NULL,
  `tel` varchar(32) DEFAULT NULL,
  `website` varchar(1024) DEFAULT NULL,
  `disabled` tinyint(4) DEFAULT NULL,
  `disabledWC` tinyint(4) DEFAULT NULL,
  `description` varchar(4096) CHARACTER SET utf8 DEFAULT NULL,
  `status` tinyint(4) DEFAULT NULL,
  `location` varchar(32) DEFAULT NULL,
  `type` varchar(32) DEFAULT NULL,
  `date_modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `image1name` varchar(255) DEFAULT NULL,
  `image2name` varchar(255) DEFAULT NULL,
  `image3name` varchar(255) DEFAULT NULL,
  `image4name` varchar(255) DEFAULT NULL,
  `image5name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`idCode`)
) ENGINE=MyISAM AUTO_INCREMENT=129 DEFAULT CHARSET=latin1

And hte result of SELECT @@character_set_database:

SELECT @@character_set_database
    -> ;
+--------------------------+
| @@character_set_database |
+--------------------------+
| utf8                     |
+--------------------------+
1 row in set (0.00 sec)

And the results of dumpParams:

SQL: [410] UPDATE venues SET name = :name, address = :address, postcode = :postcode, tel = :tel, website = :website, disabled = :disabled, disabledWC = :disabledWC, description = :description, location = :location, type = :type, status = :status, date_modified = NOW(), image1name = :image1name, image2name = :image2name, image3name = :image3name, image4name = :image4name, image5name = :image5name WHERE idCode = :idCode
Params:  17
Key: Name: [7] :idCode
paramno=-1
name=[7] ":idCode"
is_param=1
param_type=2
Key: Name: [5] :name
paramno=-1
name=[5] ":name"
is_param=1
param_type=2
Key: Name: [8] :address
paramno=-1
name=[8] ":address"
is_param=1
param_type=2
Key: Name: [9] :postcode
paramno=-1
name=[9] ":postcode"
is_param=1
param_type=2
Key: Name: [4] :tel
paramno=-1
name=[4] ":tel"
is_param=1
param_type=2
Key: Name: [8] :website
paramno=-1
name=[8] ":website"
is_param=1
param_type=2
Key: Name: [9] :disabled
paramno=-1
name=[9] ":disabled"
is_param=1
param_type=2
Key: Name: [11] :disabledWC
paramno=-1
name=[11] ":disabledWC"
is_param=1
param_type=2
Key: Name: [12] :description
paramno=-1
name=[12] ":description"
is_param=1
param_type=2
Key: Name: [9] :location
paramno=-1
name=[9] ":location"
is_param=1
param_type=2
Key: Name: [5] :type
paramno=-1
name=[5] ":type"
is_param=1
param_type=2
Key: Name: [7] :status
paramno=-1
name=[7] ":status"
is_param=1
param_type=2
Key: Name: [11] :image1name
paramno=-1
name=[11] ":image1name"
is_param=1
param_type=2
Key: Name: [11] :image2name
paramno=-1
name=[11] ":image2name"
is_param=1
param_type=2
Key: Name: [11] :image3name
paramno=-1
name=[11] ":image3name"
is_param=1
param_type=2
Key: Name: [11] :image4name
paramno=-1
name=[11] ":image4name"
is_param=1
param_type=2
Key: Name: [11] :image5name
paramno=-1
name=[11] ":image5name"
is_param=1
param_type=2

Logged hex bytes (and expected string value) of $description:

 Desctription: 526f756e64696e67206f6666205361747572646179206e6967687420696e20756e666f726765747461626c65207374796c652c204672656e636820636f6d70616e792047c3a96ec3a972696b20566170657572206272696e67207468656972206e65772073686f772057617465726c69747a20746f207468652048617420466169722e204174204b696e672047656f726765205620506c6179696e67204669656c64732c2042617220456e642e: Rounding off Saturday night in unforgettable style, French company Générik Vapeur bring their new show Waterlitz to the Hat Fair. At King George V Playing Fields, Bar End.

Upvotes: 6

Views: 4333

Answers (3)

Chris Marshall
Chris Marshall

Reputation: 5332

I had the same problem, and I figured it out with help from this post. I was feeding perfectly valid umlauts into the DB, but it was storing junk.

I basically just added this: array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8'") after my connect/init password, thusly:

$this->db = new PDO($this->dsn, $this->username, $this->password, 
                    array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8'"));

Upvotes: 2

eggyal
eggyal

Reputation: 125995

For the word Générik, your "logged hex bytes" are 0x47c3a96ec3a972696b. This is indeed UTF-8 encoded. The client with which you are attempting to verify your stored data is almost certainly setting the wrong character set prior to fetching the table contents.

Upvotes: 2

cojack
cojack

Reputation: 2620

  try {
      $dbh = new PDO(
          'mysql:host='.$server.';dbname='.$database, 
          $user, 
          $password,
          array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'UTF8'")
      );
      $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
  } catch (PDOException $e) {
      print "Error!: " . $e->getMessage() . "<br/>";
      die();
  }

Don't encode data before puting them to the db, just pure raw utf-8 data send. Should be enough.

Regards.

Upvotes: 7

Related Questions