Tanvir Ul Haque
Tanvir Ul Haque

Reputation: 185

PHP str_getcsv() does not parse CSV correctly if it contains Japanese character

I am trying to convert an excel file to an array using file() function. Some fields are containing Japanese character. For those field, I am not getting correct data.

Here is my code line

$data = array_map('str_getcsv', file($path));

Upvotes: 1

Views: 2622

Answers (2)

Tanvir Ul Haque
Tanvir Ul Haque

Reputation: 185

I have solved the problem by using

mb_convert_encoding($csv_data[$i][2],"UTF-8", "SJIS");

This will convert all Shift-JIS encoded characters to UTF-8.

Upvotes: 1

akky
akky

Reputation: 2907

I can only guess without details such like what input Japanese letters were how wrongly converted.

str_getcsv() sees system locale, so setting Japanese locale might fix the issue.

This code

setlocale(LC_ALL, 'ja_JP');
$data = array_map('str_getcsv', file('japanese.csv'));
var_dump($data);

works with the following CSV file (japanese.csv, saved in UTF-8) on my local.

日本語,テスト,ファイル
2行目,CSV形式,エンコードUTF-8

The results are

array(2) {
  [0]=>
  array(3) {
    [0]=>
    string(9) "日本語"
    [1]=>
    string(9) "テスト"
    [2]=>
    string(12) "ファイル"
  }
  [1]=>
  array(3) {
    [0]=>
    string(7) "2行目"
    [1]=>
    string(9) "CSV形式"
    [2]=>
    string(20) "エンコードUTF-8"
  }
}

As you see, str_getcsv() requires you to know what kind of languages are used in input CSV file. This time you may be sure that the input are always in Japanese, but it is not usable for parsing CSV if its language is unpredictable. Also, you would need to be careful that the directed locale could be missing if your code is used on different environment.

Upvotes: 2

Related Questions