Sebastian
Sebastian

Reputation: 3

How can I decode Asian characters correctly using php

<?php

include('simple_html_dom.php');

$test = file_get_contents('http://translate.google.com/?langpair=en|ja&text=math');
//echo $test;

$URL = "http://translate.google.com/?langpair=en|ja&text=math";
$html = file_get_html($URL);

foreach($html->find('span.short_text') as $e)
echo  $e->innertext;

?>

I'm trying to scrape Japanese Kanji from Google translate and get the Asian chars to show up correctly, but I'm having problems. As is, this code prints ”Šw. When I uncomment the "echo $test" it prints the correct chars which are, 数学 (along with a bunch of other stuff before it). I've tried encoding/decoding, htmlspecial characters etc etc. But none of that works. My second problem is, when I manual write 数学 to a text file from my computer, and try to view that text from my iPhone, it shows up oddly, which is strange because I know the iPhone can read Japanese characters just fine. I was on chrome, but know I'm on firefox.

I can also get it to output the chars as: %C3%A6%E2%80%A2%C2%B0%C3%A5%C2%AD%C2%A6

Upvotes: 0

Views: 246

Answers (2)

John
John

Reputation: 13729

This displayed encoded Asian characters for me...

$url = html_entity_decode($string,ENT_COMPAT,"UTF-8");

Upvotes: 0

Vineet1982
Vineet1982

Reputation: 7918

You have to change the encoding of the string

mb_convert_encoding ( $str_to_conver , $to_encoding , $from_encoding ] )

Converts the character encoding of str to to_encoding from optionally from_encoding.

Upvotes: 1

Related Questions