Reputation: 394
I am creating a PHP script run from the command line on a typical LAMP stack (L = OS X) and am having a lot of trouble getting special chars to record properly in the database.
This script scans a directory recursively and inserts complete path into a MySQL database table. I've done a lot of research into how to get special chars to write to MySQL but they show up as ?
chars.
Here is the code:
<?PHP
ini_set('default_charset', 'UTF-8');
$link = mysql_connect('localhost', '--USER--', '--PASSWORD--');
mysql_set_charset('utf8',$link);
if (!$link) {
die('Could not connect: ' . mysql_error());
}
if(!mysql_select_db("files")) {
die('Could not connect: ' . mysql_error());
}
mysql_query("SET NAMES utf8");
mysql_query("SET CHARACTER SET utf8");
function startsWith($haystack, $needle) {
return $needle === "" || strrpos($haystack, $needle, -strlen($haystack)) !== FALSE;
}
function getDirContents($dir, &$results = array()) {
$files = scandir($dir);
foreach($files as $key => $value) {
$path = realpath($dir.DIRECTORY_SEPARATOR.$value);
if(startsWith($path,'/Volumes/Macintosh HD/')) {
unset($files[$key]);
} else if(!is_dir($path) && !startsWith($value,'.') && startsWith($path,'/Volumes/')) {
$results[] = $path;
$query="INSERT IGNORE INTO files (path,dir) VALUES ('$path','0')";
mysql_query($query);
} else if(is_dir($path) && !startsWith($value,'.') && startsWith($path,'/Volumes/')) {
getDirContents($path, $results);
$results[] = $path;
$query="INSERT IGNORE INTO files (path,dir) VALUES ('$path','1')";
mysql_query($query);
}
}
return $results;
}
$directory='/Volumes';
$files=getDirContents($directory);
sort($files);
print_r($files);
?>
The problematic path is:
/Volumes/Mac Stadium Shuttle 1/DIG2008060702/files/Susan-Jürgen.dvdproj/Contents/PkgInfo
Notice the umlaut char in Jürgen
. When the script prints all of the files in the array the ü
shows up correctly.
If I add a line to the PHP script to print the mysql_query()
, the following is returned:
INSERT IGNORE INTO files (path,dir) VALUES ('/Volumes/Mac Stadium Shuttle 1/DIG2008060702/files/Susan-Jürgen.dvdproj/Contents/PkgInfo','0')
Again the ü
shows up correctly.
From the MySQL command line client, I SELECT
this path:
mysql> select * from files where path like '%susan%';
...and the response:
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----------+------+---------------+
| ID | path | dir | google_id | md5 | deleted_local |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----------+------+---------------+
| 644990 | /Volumes/Mac Stadium Shuttle 1/DIG2008060702/files/Susan-Ju?rgen.dvdproj/Contents/PkgInfo | 0 | NULL | NULL | 0 |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----------+------+---------------+
...notice the ü
in Jürgen shows up as u?
(Ju?rgen)
I have worked to make sure that:
I added phpinfo();
near the top of this script (after ini_set()
) and run it from the CLI. default_charset => UTF-8 => UTF-8
shows up in the response.
After the DB is connected in the script, I added echo mysql_client_encoding($link);
and the script printed utf8
.
Also, I ran:
mysql> show variables like 'char%';
response:
+--------------------------+--------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.6.24-osx10.8-x86_64/share/charsets/ |
+--------------------------+--------------------------------------------------------+
8 rows in set (0.05 sec)
So, what am I doing wrong?
EDIT The structure of the table is:
mysql> DESCRIBE files;
+---------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+----------------+
| ID | int(11) unsigned | NO | PRI | NULL | auto_increment |
| path | varchar(510) | YES | UNI | NULL | |
| dir | enum('0','1') | YES | | 0 | |
| google_id | varchar(255) | YES | | NULL | |
| md5 | varchar(255) | YES | | NULL | |
| deleted_local | enum('0','1') | YES | | 0 | |
+---------------+------------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
ANOTHER EDIT:
mysql> show create table files;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| files | CREATE TABLE `files` (
`ID` int(11) unsigned NOT NULL AUTO_INCREMENT,
`path` varchar(510) CHARACTER SET latin1 DEFAULT NULL,
`dir` enum('0','1') CHARACTER SET latin1 DEFAULT '0',
`google_id` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`md5` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`deleted_local` enum('0','1') CHARACTER SET latin1 DEFAULT '0',
PRIMARY KEY (`ID`),
UNIQUE KEY `path` (`path`)
) ENGINE=InnoDB AUTO_INCREMENT=961879 DEFAULT CHARSET=utf8 |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.04 sec)
Upvotes: 1
Views: 890
Reputation: 3
1.set collation type of you database table field to utf8_unicode_ci
2.change in meta tag.
meta http-equiv="Content-Type" content="text/html; charset=UTF-8"
Upvotes: 0
Reputation: 6847
As shown in your second edit, the path column has the latin1 charset, even though the table defaults to utf8. Maybe you've got in this state by altering an existing table?
Try ALTER TABLE files MODIFY path VARCHAR(510) CHARACTER SET utf8;
Upvotes: 2