Reputation: 536
Consider following URL: click here
There is some encoding into Japanese characters. Firefox browser on my PC is able to detect it automatically and show the characters. For Chrome, on the other hand, I have to change the encoding manually to "Shift_JIS" to see the japanese characters.
If I try to access the content via PHP-cURL, the encoded text appears garbled like this
���ϕi�̂��ƂȂ��I�݂��Ȃ̃N�`�R�~�T�C�g�������������i�A�b�g�R�X���j�ɂ��܂����I
I tried:
curl_setopt($ch, CURLOPT_ENCODING, 'Shift_JIS');
I also tried (after downloading the curl response):
$output_str = mb_convert_encoding($curl_response, 'Shift_JIS', 'auto');
$output_str = mb_convert_encoding($curl_response, 'SJIS', 'auto');
But that does not work either.
Here is the full code
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-US,en;q=0.5',
'Connection: keep-alive'
));
//curl_setopt($ch, CURLOPT_ENCODING, 'SJIS');
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$response = curl_exec($ch);
Upvotes: 5
Views: 2117
Reputation: 69937
That page doesn't return valid HTML, it's actually Javascript. If you fetch it with curl and output it, add header('Content-type: text/html; charset=shift_jis');
to your code and when you load it in Chrome the characters will display properly.
Since the HTML doesn't specify the character set, you can specify it from the server using header()
.
To actually convert the encoding so it will display properly in your terminal, you can try the following:
Use iconv()
to convert to UTF-8
$curl_response = iconv('shift-jis', 'utf-8', $curl_response);
Use mb_convert_encoding()
to convert to UTF-8
$curl_response = mb_convert_encoding($curl_response, 'utf-8', 'shift-jis');
Both of those methods worked for me and I was able to see Japanese characters displayed correctly on my terminal.
UTF-8 should be fine, but if you know your system is using something different, you can try that instead.
Hope that helps.
Upvotes: 6
Reputation: 783
The following code will output the Japanese characters correctly in the browser:-
<?php
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $setUrlHere);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
// grab URL content
$response = curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
header('Content-type: text/html; charset=shift_jis');
echo $response;
Upvotes: 0