hvs
hvs

Reputation: 536

php curl japanese output garbled

Consider following URL: click here

There is some encoding into Japanese characters. Firefox browser on my PC is able to detect it automatically and show the characters. For Chrome, on the other hand, I have to change the encoding manually to "Shift_JIS" to see the japanese characters.

If I try to access the content via PHP-cURL, the encoded text appears garbled like this

���ϕi�̂��ƂȂ��I�݂��Ȃ̃N�`�R�~�T�C�g�������������i�A�b�g�R�X���j�ɂ��܂����I

I tried:

  curl_setopt($ch, CURLOPT_ENCODING, 'Shift_JIS');

I also tried (after downloading the curl response):

  $output_str = mb_convert_encoding($curl_response, 'Shift_JIS', 'auto');
  $output_str = mb_convert_encoding($curl_response, 'SJIS', 'auto');

But that does not work either.

Here is the full code

   curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($ch, CURLOPT_HTTPHEADER, array(
        'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language: en-US,en;q=0.5',
        'Connection: keep-alive'
    ));

    //curl_setopt($ch, CURLOPT_ENCODING, 'SJIS');
    curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    curl_setopt($ch, CURLOPT_TIMEOUT, 20);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    $response = curl_exec($ch);

Upvotes: 5

Views: 2117

Answers (2)

drew010
drew010

Reputation: 69937

That page doesn't return valid HTML, it's actually Javascript. If you fetch it with curl and output it, add header('Content-type: text/html; charset=shift_jis'); to your code and when you load it in Chrome the characters will display properly.

Since the HTML doesn't specify the character set, you can specify it from the server using header().

To actually convert the encoding so it will display properly in your terminal, you can try the following:

Use iconv() to convert to UTF-8

$curl_response = iconv('shift-jis', 'utf-8', $curl_response);

Use mb_convert_encoding() to convert to UTF-8

$curl_response = mb_convert_encoding($curl_response, 'utf-8', 'shift-jis');

Both of those methods worked for me and I was able to see Japanese characters displayed correctly on my terminal.

UTF-8 should be fine, but if you know your system is using something different, you can try that instead.

Hope that helps.

Upvotes: 6

Suleman C
Suleman C

Reputation: 783

The following code will output the Japanese characters correctly in the browser:-

<?php

// create a new cURL resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $setUrlHere);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

// grab URL content
$response = curl_exec($ch);

// close cURL resource, and free up system resources
curl_close($ch);

header('Content-type: text/html; charset=shift_jis');
echo $response;

Upvotes: 0

Related Questions