Reputation:
When I convert a sample string like this:
$str = "اوقات-شرعی-جمعه-8-مرداد-ماه-به-اÙÙ‚-اردبیل"
echo mb_convert_encoding($str, "ASCII");
from UTF-8 to ASCII the result should be this:
%D8%A8%D8%B2%D8%B1%DA%AF-%D8%AA%D8%B1%DB%8C%D9%86-%D9%88%D8%B1%D8%B2%D8%B4%DA%A9%D8%A7%D8%B1%D8%A7%D9%86-%D8%AA%D8%A7%D8%B1%DB%8C%D8%AE-%D8%A7%D9%84%D9%85%D9%BE%DB%8C%DA%A9%D8%AA%D8%B5%D8%A7%D9%88%DB%8C%D8%B1
But it's this :
?????????????????????-????????????????-??????????????????-8-?????????????????????-??????????????-?????????-?????????????-?????????????????????????
I'm really get confused Anyone knows the problem?
UPDATE : I also tryed iconv:
echo iconv("UTF-8", "ASCII", $str), PHP_EOL;
But it says :
Notice: iconv(): Detected an illegal character in input string
Upvotes: -1
Views: 16717
Reputation: 1971
In my opinion the problem with this case is that the input string is wrong and the conversion between ASCII and UTF-8 is unnecessary.
Lets start with this
$out = '%D8%A8%D8%B2%D8%B1%DA%AF-%D8%AA%D8%B1%DB%8C%D9%86-%D9%88%D8%B1%D8%B2%D8%B4%DA%A9%D8%A7%D8%B1%D8%A7%D9%86-%D8%AA%D8%A7%D8%B1%DB%8C%D8%AE-%D8%A7%D9%84%D9%85%D9%BE%DB%8C%DA%A9%D8%AA%D8%B5%D8%A7%D9%88%DB%8C%D8%B1';
When we try to get encoding of this string with
echo mb_detect_encoding($out);
then we can see that is ASCII ofcourse. But as we can see this string evidently looks like the output of the urlencode
function. Lets try to use a urldecode
function to check what is the encoding of that value
$decoded = urldecode($out);
echo mb_detect_encoding($decoded);
On the output we can see that $decoded
is an UTF-8 so trying to run this code from the question
$str = "اوقات-شرعی-جمعه-8-مرداد-ماه-به-اÙÙ‚-اردبیل"
echo mb_convert_encoding($str, "ASCII");
have no sense because there can't be ASCII encoding.
I was also be curious what is encoding of the $str
from the question so I prepared something like this to find if I can get the $str
value from $decoded
value
foreach (mb_list_encodings() as $chr) {
$test = mb_convert_encoding($decoded, $chr, 'UTF-8');
}
I was surprised that I didn't find any encoding that can give me something similiar to $str
value. I've try to do more and check conversion like in this code
foreach (mb_list_encodings() as $chr) {
foreach (mb_list_encodings() as $chr2) {
$test = mb_convert_encoding($decoded, $chr, $chr2);
}
}
and I've finally found some values looks similiar but not equal. I've do the same with the oryginal $str
but also without success (I didn't get the request output from the question).
foreach (mb_list_encodings() as $chr) {
foreach (mb_list_encodings() as $chr2) {
//try with and without urlencode
$test = urlencode(mb_convert_encoding($str, $chr, $chr2));
}
}
Ofcourse when we do this
$newOutput = urlencode($decoded);
then we get the $out
value.
The conclusion is that the conversion between ASCII and UTF-8 is obviously unnecessary in this case and the input string may be wrong (mayby because of some unnecessary confersion from UTF-8 to something I can't recognize).
Upvotes: 0
Reputation: 317
%D8 is not ascii encoding. Ascii has 127 (or 255 if you're using extended) characters (see http://www.asciitable.com/)
As such, special characters like Ø have no equivalent. mb_convert_encoding
handles this by replacing them with a ?, whereas iconv
throws an error.
The output you're after looks more like url encoding. Try this:
echo urlencode("اوقات-شرعی-جمعه-8-مرداد-ماه-به-اÙÙ‚-اردبیل");
Upvotes: 4