Reputation: 3
<?php
include('simple_html_dom.php');
$test = file_get_contents('http://translate.google.com/?langpair=en|ja&text=math');
//echo $test;
$URL = "http://translate.google.com/?langpair=en|ja&text=math";
$html = file_get_html($URL);
foreach($html->find('span.short_text') as $e)
echo $e->innertext;
?>
I'm trying to scrape Japanese Kanji from Google translate and get the Asian chars to show up correctly, but I'm having problems. As is, this code prints ”Šw. When I uncomment the "echo $test" it prints the correct chars which are, 数学 (along with a bunch of other stuff before it). I've tried encoding/decoding, htmlspecial characters etc etc. But none of that works. My second problem is, when I manual write 数学 to a text file from my computer, and try to view that text from my iPhone, it shows up oddly, which is strange because I know the iPhone can read Japanese characters just fine. I was on chrome, but know I'm on firefox.
I can also get it to output the chars as: %C3%A6%E2%80%A2%C2%B0%C3%A5%C2%AD%C2%A6
Upvotes: 0
Views: 246
Reputation: 13729
This displayed encoded Asian characters for me...
$url = html_entity_decode($string,ENT_COMPAT,"UTF-8");
Upvotes: 0
Reputation: 7918
You have to change the encoding of the string
mb_convert_encoding ( $str_to_conver , $to_encoding , $from_encoding ] )
Converts the character encoding of str to to_encoding from optionally from_encoding.
Upvotes: 1