Reputation: 1121
I am facing problem on reading unicode characters from CSV file using PHP.
Find below is the screenshot of the UNICODE csv file.
The PHP code I use is as below.
$delimiter = ",";
$row = 1;
$handle = fopen($filePath, "r");
while (($data = fgetcsv($handle, 1000, $delimiter)) !== FALSE) {
$num = count($data);
$row++;
for ($c=0; $c < $num; $c++) {
echo $data[$c];
}
}
fclose($handle);
For the above code I get the below as output in chrome browser. It has junk characters.
But if I add a newline character on the echo statement as below it gives the correct output.
echo $data[$c]."\n";
Why it behaves like this? I do not want to append a newline like this.
Upvotes: 1
Views: 3315
Reputation: 536715
UNICODE csv file.
The encoding that Windows calls “Unicode” (misleadingly; Unicode is not an encoding) is actually UTF-16LE. This is a two-byte-per-code-unit encoding, so ASCII characters come out as the ASCII byte followed by a zero byte.
PHP's fgetcsv
function doesn't support UTF-16 CSV, it only supports encodings that are ASCII-compatible. It splits on each byte 0x0A (newline) and 0x2C (comma), but in UTF-16LE both the newline and the comma are two-byte sequences, 0x0A 0x00 and 0x2C 0x00 respectively. That means you get leading single 0x00 bytes on the front of each field but the first, and you get wrong splits when a value contains a 0x0A or 0x2C byte that is not part of a UTF-16-encoded newline/comma.
When you print this out to UTF-16LE-encoded output, the extra 0x00 byte puts each field out of two-byte-alignment with the last, which means that the browser viewing it sees alternating fields as being out of alignment and prints nonsense characters formed of the lead byte of one character with the trail byte of the one before it.
So there are two possible things you can do:
if you have any choice in the matter, avoid UTF-16. Because it's not ASCII-compatible it breaks lots of tools that expect that. Generally the best encoding is UTF-8, which can include all characters and still be an ASCII-superset... unfortunately Excel refuses to save CSV files directly in UTF-8.
use some other CSV parser that understands UTF-16. It's a good idea to avoid PHP's CSV functions anyway because they do weird things that don't match standard CSV (in as much as there is a standard... at least it doesn't match RFC 4180 and what Excel produces).
Upvotes: 2
Reputation: 594
Try to add this before showing the text
header('Content-Type: text/html; charset=utf-8');
$delimiter = ",";
$row = 1;
$handle = fopen($filePath, "r");
while (($data = fgetcsv($handle, 1000, $delimiter)) !== FALSE) {
$num = count($data);
$row++;
for ($c=0; $c < $num; $c++) {
echo $data[$c];
}
}
fclose($handle);
Upvotes: 0