sdolgy
sdolgy

Reputation: 7001

decoding the JSON output from Microsoft translator API with PHP

this issue seems specific to microsofttranslator.com so please ... any answers, if you can test against it ...

Using the following URL for translation: http://api.microsofttranslator.com/V2/Ajax.svc/TranslateArray .. I send via cURL some fantastic arguments, and get back the following result:

 [
      {
           "From":"en",
           "OriginalTextSentenceLengths":[13],
           "TranslatedText":"我是最好的",
           "TranslatedTextSentenceLengths":[5]
      },
      {
           "From":"en",
           "OriginalTextSentenceLengths":[16],
           "TranslatedText":"你是最好的",
           "TranslatedTextSentenceLengths":[5]
      }
 ]

When I use json_decode($output, true); on the output from cURL, json_decode gives an error about the syntax not being appropriate in the returned JSON:

 json_last_error() == JSON_ERROR_SYNTAX

The headers being returned with the JSON:

Response Headers

 Cache-Control:no-cache
 Content-Length:244
 Content-Type:application/x-javascript; charset=utf-8
 Date:Sat, 06 Aug 2011 13:35:08 GMT
 Expires:-1
 Pragma:no-cache
 X-MS-Trans-Info:s=63644

Raw content:

 [{"From":"en","OriginalTextSentenceLengths":[13],"TranslatedText":"我是最好的","TranslatedTextSentenceLengths":[5]},{"From":"en","OriginalTextSentenceLengths":[16],"TranslatedText":"你是最好的","TranslatedTextSentenceLengths":[5]}]

cURL code:

    $texts = array("i am the best" => 0, "you are the best" => 0);
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $data = array(
        'appId' => $bing_appId,
        'from' => 'en',
        'to' => 'zh-CHS',
        'texts' => json_encode(array_keys($texts))
    );
    curl_setopt($ch, CURLOPT_URL, $bingArrayUrl . '?' . http_build_query($data)); 
    $output = curl_exec($ch); 

Upvotes: 4

Views: 3474

Answers (2)

Cedric Han
Cedric Han

Reputation: 174

The API is returning a wrong byte order mark (BOM).
The string data itself is UTF-8 but is prepended with U+FEFF which is a UTF-16 BOM. Just strip out the first two bytes and json_decode.

...
$output = curl_exec($ch);
// Insert some sanity checks here... then,
$output = substr($output, 3);
...
$decoded = json_decode($output, true);

Here's the entirety of my test code.

$texts = array("i am the best" => 0, "you are the best" => 0);
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = array(
    'appId' => $bing_appId,
    'from' => 'en',
    'to' => 'zh-CHS',
    'texts' => json_encode(array_keys($texts))
    );
curl_setopt($ch, CURLOPT_URL, $bingArrayUrl . '?' . http_build_query($data)); 
$output = curl_exec($ch);
$output = substr($output, 3);
print_r(json_decode($output, true));

Which gives me

Array
(
    [0] => Array
        (
            [From] => en
            [OriginalTextSentenceLengths] => Array
                (
                    [0] => 13
                )

            [TranslatedText] => 我是最好的
            [TranslatedTextSentenceLengths] => Array
                (
                    [0] => 5
                )

        )

    [1] => Array
        (
            [From] => en
            [OriginalTextSentenceLengths] => Array
                (
                    [0] => 16
                )

            [TranslatedText] => 你是最好的
            [TranslatedTextSentenceLengths] => Array
                (
                    [0] => 5
                )

        )

)

Wikipedia entry on BOM

Upvotes: 6

vicTROLLA
vicTROLLA

Reputation: 1529

There is nothing syntactically wrong with your JSON string. It is possible that the json is coming back with characters outside the UTF-8 byte range, but this would cause json_decode() to throw an exception indicating that.

Test Code:

ini_set("track_errors", 1);

$json = '
 [
      {
           "From":"en",
           "OriginalTextSentenceLengths":[13],
           "TranslatedText":"我是最好的",
           "TranslatedTextSentenceLengths":[5]
      },
      {
           "From":"en",
           "OriginalTextSentenceLengths":[16],
           "TranslatedText":"你是最好的",
           "TranslatedTextSentenceLengths":[5]
      }
 ]
';

$out = @json_decode($json, TRUE);

if(!$out) {
        throw new Exception("$php_errormsg\n");
} else {
        print_r($out);
}

?>

Output:

$ php -f jsontest.php 
Array
(
    [0] => Array
        (
            [From] => en
            [OriginalTextSentenceLengths] => Array
                (
                    [0] => 13
                )                                                                                                                                                                   

            [TranslatedText] => 我是最好的                                                                                                                                          
            [TranslatedTextSentenceLengths] => Array                                                                                                                                
                (                                                                                                                                                                   
                    [0] => 5                                                                                                                                                        
                )                                                                                                                                                                   

        )                                                                                                                                                                           

    [1] => Array                                                                                                                                                                    
        (                                                                                                                                                                           
            [From] => en                                                                                                                                                            
            [OriginalTextSentenceLengths] => Array                                                                                                                                  
                (                                                                                                                                                                   
                    [0] => 16                                                                                                                                                       
                )                                                                                                                                                                   

            [TranslatedText] => 你是最好的                                                                                                                                          
            [TranslatedTextSentenceLengths] => Array                                                                                                                                
                (                                                                                                                                                                   
                    [0] => 5                                                                                                                                                        
                )                                                                                                                                                                   

        )                                                                                                                                                                           

)

Upvotes: 1

Related Questions