Reputation: 760
Based on the answer provided here by user user1830391: Some characters in CSV file are not read during PHP fgetcsv()
I updated my following code to use fgets() instead of fgetcsv(). It fixed my first character issue. thats no longer a prob... but...
what if the .csv file is seprated using ; instead of , Some fields will be wrapped using double quotes "", for example one of my rows is split onto 2 lines. quote opened in the last element of one line and closed at the end of the first element of the next line. There is an "enter"(/n) in that cell. how should i treat this using this code. fgetcsv catches elements within double quotes but i dont think fgets() does.
function runCSVtoArray() {
// --> FOR IMPORT
//Function that converts a CSV file to a PHP array.
//echo '<span class="success">Chargement du fichier CSV pour importation MYSQL....</span><br />';
$readCharsPerLine = (JRequest::getVar('charsPerLine') > 0) ? JRequest::getVar('charsPerLine') : 1500; /* Import as of 2012-04-16 seem to have max 800chars per line. 1500 is alot of extra. */
ini_set("auto_detect_line_endings", true);
iconv_set_encoding("internal_encoding", "UTF-8");
$openfile = $this->imp['importPath'].$this->imp['csvFileName'];
if ( file_exists($openfile) ) {
//echo '<span class="success">Fichier CSV trouvé....</span><br />';
//echo '<span class="success">Ouverture du fichier : '.$openfile.'</span><br />';
if (($handle = fopen($openfile, "r")) !== FALSE) {
//echo '<span class="success">Fichier CSV ouvert... Chargement en cours....</span><br />';
$row_i=0;
$this->_importData = array();
/*while (($data = fgetcsv($handle, $readCharsPerLine, ";")) !== FALSE) {*/
while (($the_line = fgets($handle)) !== FALSE) {
$data = explode(';', $the_line);
$debugoutput = implode('; ', $data).'<br />'; echo ( (JRequest::getVar('encodeutf8')) && ( mb_detect_encoding($debugoutput, "UTF-8") == "UTF-8") ) ? utf8_encode($debugoutput) : $debugoutput.'<br />'; //Debug2
/*
$num = count($data);
if ($row_i==0) {
// TITLE ROW
$keyRow = array();
for ($c=0; $c < $num; $c++) {
//Making title array with CSV first line
//Key for colum
if ( (JRequest::getVar('encodeutf8')) && ( mb_detect_encoding($data[$c], "UTF-8") == "UTF-8") ) { $data[$c] = utf8_encode($data[$c]); }
if ($data[$c]!="") {
$keyRow[$c]=trim($data[$c]);
$keyRow[$c]=str_replace('GDWACCENT', '', $keyRow[$c]); //STRIP GDWACCENT, GDW uTF8 fgetcsv fix
}
else { $keyRow[$c]=''; }
}
} else {
//VALUE ROW...
for ($c=0; $c < $num; $c++) {
$key = $keyRow[$c];
if ( (JRequest::getVar('encodeutf8')) && ( mb_detect_encoding($data[$c], "UTF-8") == "UTF-8") ) {
$data[$c] = utf8_encode($data[$c]);
$data[$c]=str_replace('GDWACCENT', '', $data[$c]); //STRIP GDWACCENT, GDW uTF8 fgetcsv fix
}
if ($data[$c]!="") {
$this->_importData[$row_i][$key]=trim($data[$c]);
$this->_importData[$row_i][$key]=str_replace('GDWACCENT', '', $this->_importData[$row_i][$key]); //STRIP GDWACCENT, GDW uTF8 fgetcsv fix
}
}
}
*/
$row_i++;
} //End while()
//echo '<span class="success">Chargement terminer.... Sauvegarde en cours...</span><br />';
return true;
} else {
//Incapable d'ouvrir le fichier d'importation.
return false;
}
} else {
//FILE NOT FOUND...
return false;
}
} // runCSVtoArray()
Upvotes: 1
Views: 9130
Reputation: 760
I solved this by opening the file using fopen and fgets instead of fgetcsv() and writing a copy using utf8_encode for each line. Then i use the copy and put that through fgetcsv()
here is my updated code.
function runCSVtoArray() {
// --> FOR IMPORT
//Function that converts a CSV file to a PHP array.
//echo '<span class="success">Chargement du fichier CSV pour importation MYSQL....</span><br />';
$readCharsPerLine = (JRequest::getVar('charsPerLine') > 0) ? JRequest::getVar('charsPerLine') : 1500; /* Import as of 2012-04-16 seem to have max 800chars per line. 1500 is alot of extra. */
putenv("LANG=fr_CA.UTF-8");
setlocale(LC_ALL, 'fr_CA.UTF-8');
//ini_set("auto_detect_line_endings", true);
//iconv_set_encoding("internal_encoding", "UTF-8");
$openfile = $this->imp['importPath'].$this->imp['csvFileName'];
$utf8File = str_replace('.csv', '_utf8.csv', $openfile);
if ( file_exists($openfile) ) {
//echo '<span class="success">Fichier CSV trouvé....</span><br />';
//rewrite the file in UTF8
if (JRequest::getVar('encodeutf8')) {
if (($handle = fopen($openfile, "r")) !== FALSE) {
$newFileHandle = fopen($utf8File, 'w'); //NEW UTF8 FORMAT
//fwrite($newFileHandle, "\xEF\xBB\xBF");
while (($the_line = fgets($handle)) !== FALSE) {
fwrite($newFileHandle, utf8_encode($the_line));
} //End of while()
}
$openfile = $utf8File;
}
//echo '<span class="success">Ouverture du fichier : '.$openfile.'</span><br />';
if (($handle = fopen($openfile, "r")) !== FALSE) {
//echo '<span class="success">Fichier CSV ouvert... Chargement en cours....</span><br />';
$row_i=0;
$this->_importData = array();
while (($data = fgetcsv($handle, $readCharsPerLine, ";")) !== FALSE) {
/*while (($the_line = fgets($handle)) !== FALSE) {*/
//$data = explode(';', $the_line);
//$debugoutput = implode('; ', $data); echo ( (JRequest::getVar('encodeutf8')) && ( mb_detect_encoding($debugoutput, "UTF-8") == "UTF-8") ) ? utf8_encode($debugoutput).'<br />' : $debugoutput.'<br />'; //Debug2
//$debugoutput = implode('; ', $data); echo $debugoutput.'<br />'; //Debug2
$num = count($data);
if ($row_i==0) {
// TITLE ROW
$keyRow = array();
$maxItems = count($data); //Count the number of ";"
for ($c=0; $c < $num; $c++) {
//Making title array with CSV first line
//Key for colum
if ( (JRequest::getVar('encodeutf8')) && ( mb_detect_encoding($data[$c], "UTF-8") == "UTF-8") ) {
//$data[$c] = utf8_encode($data[$c]);
$data[$c] = $data[$c];
}
if ($data[$c]!="") {
$keyRow[$c]=trim($data[$c]);
$keyRow[$c]=str_replace('GDWACCENT', '', $keyRow[$c]); //STRIP GDWACCENT, GDW uTF8 fgetcsv fix
}
else { $keyRow[$c]=''; }
}
} else {
//VALUE ROW...
for ($c=0; $c < $num; $c++) {
$key = $keyRow[$c];
if ( (JRequest::getVar('encodeutf8')) && ( mb_detect_encoding($data[$c], "UTF-8") == "UTF-8") ) {
//$data[$c] = utf8_encode($data[$c]);
$data[$c] = $data[$c];
$data[$c]=str_replace('GDWACCENT', '', $data[$c]); //STRIP GDWACCENT, GDW uTF8 fgetcsv fix
}
if ($data[$c]!="") {
$this->_importData[$row_i][$key]=trim($data[$c]);
$this->_importData[$row_i][$key]=str_replace('GDWACCENT', '', $this->_importData[$row_i][$key]); //STRIP GDWACCENT, GDW uTF8 fgetcsv fix
}
} //End of for()
}
$row_i++;
} //End while()
//echo 'HERE<br />';
//gdwprint($this->_importData);
//exit();
//echo '<span class="success">Chargement terminer.... Sauvegarde en cours...</span><br />';
return true;
} else {
//Incapable d'ouvrir le fichier d'importation.
return false;
}
} else {
//FILE NOT FOUND...
return false;
}
} // runCSVtoArray()
Upvotes: 3
Reputation: 6249
From my experience the input data for fgetcsv()
must be in UTF-8.
In your case if you have É ignored in Éric then your input is not UTF-8 but probably some single byte encoding instead (Windows-1252? echo bin2hex($str);
to verify). There is a bugreport in php bug tracker (https://bugs.php.net/bug.php?id=55507). Solution is to convert text to utf8 before feeding to fgetcsv
Also it is importat for the UTF-8 not to contain BOM.
Upvotes: 1
Reputation: 158090
The answer you are relying on, that says fgetcsv works only with ascii chars, is simply wrong. True is :
Note:
Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in >one-byte encoding are read wrong by this function.
So you'll have to configure your LANG
variable instead of using fgets.
Here comes an example how to set the lang variable:
putenv("LANG=fr_FR.UTF-8");
Upvotes: 4