Reputation: 2290
I have read a lot of articles but still i dont get it
Im importing text from file using
$fp = fopen($storagename, 'r');
while ( !feof($fp) ){
$line = fgets($fp, 2048);
$delimiter = "\t";
$data = str_getcsv($line, $delimiter);
print_r($data);
}
For displaying numbers and english charachters correctly i had to use
str_replace("\x00", '', $data[7])
But now trying to display hebrew charachters ends up looking like �
I have tried converting with iconv/mb_convert_encoding/utf8_decode/encode Nothing helps..
Any assistance will be great
Upvotes: 1
Views: 720
Reputation: 146660
UCS-2 is an older version of UTF-16 so you should probably try both (auto-detect text encoding is not a bullet-proof job).
We have the source encoding. We can speculate the target encoding is UTF-8 (because it's the sensible choice in 2016 and your question is actually tagged as UTF-8). So we have all we need.
We should first remove non-standard raw byte manipulations (e.g. remove str_replace("\x00", '', $data[7])
and similar code). We can then do a proper conversion. If you use mb_convert_encoding(), an initial approach could be:
$delimiter = "\t";
$fp = fopen($storagename, 'r');
while ( !feof($fp) ){
$line = mb_convert_encoding(fgets($fp, 2048), 'UTF-8', 'UCS-2LE');
$data = str_getcsv($line, $delimiter);
print_r($data);
}
You can check the list of supported encodings.
But we have a potential problem here: there's no way to tell str_getcsv()
about the file encoding so it's unlikely that it will recognise your UCS-2 line endings.
You can try different solutions depending of the size of the CSV file. If it's small, I'll simply convert it at once. Otherwise, I'll have a look at stream_get_line():
This function is nearly identical to fgets() except in that it allows end of line delimiters other than the standard \n, \r, and \r\n, and does not return the delimiter itself.
It'd be something like this:
$ending = mb_convert_encoding("\n", 'UCS-2LE', 'UTF-8');
$line = mb_convert_encoding(stream_get_line($fp, 2048, $ending), 'UTF-8', 'UCS-2LE');
This should work with both Unix line endings (\n
) and Windows ones (\r\n
).
Upvotes: 3