andrewniesen
andrewniesen

Reputation: 394

Can't get UTF-8 Special Chars to Correctly Write to MySQL (PHP)

I am creating a PHP script run from the command line on a typical LAMP stack (L = OS X) and am having a lot of trouble getting special chars to record properly in the database.

This script scans a directory recursively and inserts complete path into a MySQL database table. I've done a lot of research into how to get special chars to write to MySQL but they show up as ? chars.

Here is the code:

<?PHP
ini_set('default_charset', 'UTF-8');


$link = mysql_connect('localhost', '--USER--', '--PASSWORD--');
mysql_set_charset('utf8',$link);

if (!$link) {
   die('Could not connect: ' . mysql_error());
}

if(!mysql_select_db("files")) {
   die('Could not connect: ' . mysql_error());
}

mysql_query("SET NAMES utf8");
mysql_query("SET CHARACTER SET utf8");

function startsWith($haystack, $needle) {
    return $needle === "" || strrpos($haystack, $needle, -strlen($haystack)) !== FALSE;
}

function getDirContents($dir, &$results = array()) {
  $files = scandir($dir);
    foreach($files as $key => $value) {
            $path = realpath($dir.DIRECTORY_SEPARATOR.$value);
            if(startsWith($path,'/Volumes/Macintosh HD/')) {
                    unset($files[$key]);
            } else if(!is_dir($path) && !startsWith($value,'.') && startsWith($path,'/Volumes/')) {
                    $results[] = $path;
                    $query="INSERT IGNORE INTO files (path,dir) VALUES ('$path','0')";
                    mysql_query($query);
            } else if(is_dir($path) && !startsWith($value,'.') && startsWith($path,'/Volumes/')) {
                    getDirContents($path, $results);
                    $results[] = $path;
                    $query="INSERT IGNORE INTO files (path,dir) VALUES ('$path','1')";
                    mysql_query($query);
            }
    }
    return $results;
}


$directory='/Volumes'; 
$files=getDirContents($directory);
sort($files);
print_r($files);

?>

The problematic path is:

/Volumes/Mac Stadium Shuttle 1/DIG2008060702/files/Susan-Jürgen.dvdproj/Contents/PkgInfo

Notice the umlaut char in Jürgen. When the script prints all of the files in the array the ü shows up correctly.

If I add a line to the PHP script to print the mysql_query(), the following is returned:

INSERT IGNORE INTO files (path,dir) VALUES ('/Volumes/Mac Stadium Shuttle 1/DIG2008060702/files/Susan-Jürgen.dvdproj/Contents/PkgInfo','0')

Again the ü shows up correctly.

From the MySQL command line client, I SELECT this path:

mysql> select * from files where path like '%susan%';

...and the response:

+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----------+------+---------------+
| ID     | path                                                                                                                                                                  | dir  | google_id | md5  | deleted_local |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----------+------+---------------+
| 644990 | /Volumes/Mac Stadium Shuttle 1/DIG2008060702/files/Susan-Ju?rgen.dvdproj/Contents/PkgInfo                                                                             | 0    | NULL      | NULL | 0             |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----------+------+---------------+

...notice the ü in Jürgen shows up as u? (Ju?rgen)

I have worked to make sure that:

I added phpinfo(); near the top of this script (after ini_set()) and run it from the CLI. default_charset => UTF-8 => UTF-8 shows up in the response.

After the DB is connected in the script, I added echo mysql_client_encoding($link); and the script printed utf8.

Also, I ran:

mysql> show variables like 'char%';

response:

 +--------------------------+--------------------------------------------------------+
 | Variable_name            | Value                                                  |
 +--------------------------+--------------------------------------------------------+
 | character_set_client     | utf8                                                   |
 | character_set_connection | utf8                                                   |
 | character_set_database   | utf8                                                   |
 | character_set_filesystem | binary                                                 |
 | character_set_results    | utf8                                                   |
 | character_set_server     | utf8                                                   |
 | character_set_system     | utf8                                                   |
 | character_sets_dir       | /usr/local/mysql-5.6.24-osx10.8-x86_64/share/charsets/ |
 +--------------------------+--------------------------------------------------------+
 8 rows in set (0.05 sec)

So, what am I doing wrong?

EDIT The structure of the table is:

 mysql> DESCRIBE files;
 +---------------+------------------+------+-----+---------+----------------+
 | Field         | Type             | Null | Key | Default | Extra          |
 +---------------+------------------+------+-----+---------+----------------+
 | ID            | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
 | path          | varchar(510)     | YES  | UNI | NULL    |                |
 | dir           | enum('0','1')    | YES  |     | 0       |                |
 | google_id     | varchar(255)     | YES  |     | NULL    |                |
 | md5           | varchar(255)     | YES  |     | NULL    |                |
 | deleted_local | enum('0','1')    | YES  |     | 0       |                |
 +---------------+------------------+------+-----+---------+----------------+
 6 rows in set (0.00 sec)

ANOTHER EDIT:

 mysql> show create table files;
 +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | Table | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
 +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | files | CREATE TABLE `files` (
      `ID` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `path` varchar(510) CHARACTER SET latin1 DEFAULT NULL,
      `dir` enum('0','1') CHARACTER SET latin1 DEFAULT '0',
      `google_id` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
      `md5` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
      `deleted_local` enum('0','1') CHARACTER SET latin1 DEFAULT '0',
      PRIMARY KEY (`ID`),
      UNIQUE KEY `path` (`path`)
 ) ENGINE=InnoDB AUTO_INCREMENT=961879 DEFAULT CHARSET=utf8 |
  +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  1 row in set (0.04 sec)

Upvotes: 1

Views: 890

Answers (2)

Kailash Pathak
Kailash Pathak

Reputation: 3

1.set collation type of you database table field to utf8_unicode_ci

2.change in meta tag.

meta http-equiv="Content-Type" content="text/html; charset=UTF-8"

  1. you can use echo utf8_encode($value); in your page.

Upvotes: 0

Ewan Mellor
Ewan Mellor

Reputation: 6847

As shown in your second edit, the path column has the latin1 charset, even though the table defaults to utf8. Maybe you've got in this state by altering an existing table?

Try ALTER TABLE files MODIFY path VARCHAR(510) CHARACTER SET utf8;

Upvotes: 2

Related Questions