Reputation: 1032
It's seems to file_get_contents() automatically encodes url by urlencode(). Even if url supplied as unicode. It's easy to replicate. Try following code. (PHP script must be saved in unicode format and uploads folder must have write permissions)
<?php
$mp3 = file_get_contents ( "http://translate.google.com/translate_tts?tl=pt&q={rotações}" );
file_put_contents ( "uploads/test.mp3", $mp3 );
echo "<audio id=\"player\" src=\"uploads/test.mp3\"></audio>";
echo "<button onclick=\"document.getElementById('player').play()\" style=\"font-size:$font_size\">Play</button>";
?>
It must save sound file for portugese word "rotações". But it saves funny sound of "rota%C3%A7%C3%B5es". Can be easy confirmed by adding urlencode() to url.
But if you put the url http://translate.google.com/translate_tts?tl=pt&q={rotações} into browser address line -- you'll hear correct sound !
Same problem if unicode url isn't hardcoded in the script but supplied from database.
So my question -- how to force PHP to request correct unicode url, not processed by urlencode() ?
P.S. I tried to replace file_get_contents() with CURL realization as described in PHP file_get_contents specific encoding -- no effect..
Upvotes: 1
Views: 1368
Reputation: 41885
In order for this to work, you need to append an additional &ie=UTF-8
on your query string.
So this would look like:
http://translate.google.com/translate_tts?tl=pt&q={rota%C3%A7%C3%B5es}&ie=UTF-8
In the code:
$text = urlencode('rotações');
$url = "http://translate.google.com/translate_tts?tl=pt&q={$text}&ie=UTF-8";
$mp3 = file_get_contents($url);
file_put_contents('uploads/test.mp3', $mp3);
echo "<audio id=\"player\" src=\"uploads/test.mp3\"></audio>";
echo "<button onclick=\"document.getElementById('player').play()\">Play</button>";
Upvotes: 4