user6555950
user6555950

Reputation:

Converting UTF-8 to ASCII

When I convert a sample string like this:

$str = "اوقات-شرعی-جمعه-8-مرداد-ماه-به-اÙÙ‚-اردبیل"
    echo mb_convert_encoding($str, "ASCII");

from UTF-8 to ASCII the result should be this:

%D8%A8%D8%B2%D8%B1%DA%AF-%D8%AA%D8%B1%DB%8C%D9%86-%D9%88%D8%B1%D8%B2%D8%B4%DA%A9%D8%A7%D8%B1%D8%A7%D9%86-%D8%AA%D8%A7%D8%B1%DB%8C%D8%AE-%D8%A7%D9%84%D9%85%D9%BE%DB%8C%DA%A9%D8%AA%D8%B5%D8%A7%D9%88%DB%8C%D8%B1

But it's this :

?????????????????????-????????????????-??????????????????-8-?????????????????????-??????????????-?????????-?????????????-?????????????????????????

I'm really get confused Anyone knows the problem?

UPDATE : I also tryed iconv:

echo iconv("UTF-8", "ASCII", $str), PHP_EOL;

But it says :

Notice: iconv(): Detected an illegal character in input string

Upvotes: -1

Views: 16717

Answers (2)

zajonc
zajonc

Reputation: 1971

In my opinion the problem with this case is that the input string is wrong and the conversion between ASCII and UTF-8 is unnecessary.

Lets start with this

$out = '%D8%A8%D8%B2%D8%B1%DA%AF-%D8%AA%D8%B1%DB%8C%D9%86-%D9%88%D8%B1%D8%B2%D8%B4%DA%A9%D8%A7%D8%B1%D8%A7%D9%86-%D8%AA%D8%A7%D8%B1%DB%8C%D8%AE-%D8%A7%D9%84%D9%85%D9%BE%DB%8C%DA%A9%D8%AA%D8%B5%D8%A7%D9%88%DB%8C%D8%B1';

When we try to get encoding of this string with

echo mb_detect_encoding($out);

then we can see that is ASCII ofcourse. But as we can see this string evidently looks like the output of the urlencode function. Lets try to use a urldecode function to check what is the encoding of that value

$decoded = urldecode($out);
echo mb_detect_encoding($decoded);

On the output we can see that $decoded is an UTF-8 so trying to run this code from the question

$str = "اوقات-شرعی-جمعه-8-مرداد-ماه-به-اÙÙ‚-اردبیل"
echo mb_convert_encoding($str, "ASCII");

have no sense because there can't be ASCII encoding.

I was also be curious what is encoding of the $str from the question so I prepared something like this to find if I can get the $str value from $decoded value

foreach (mb_list_encodings() as $chr) {
    $test = mb_convert_encoding($decoded, $chr, 'UTF-8');
}

I was surprised that I didn't find any encoding that can give me something similiar to $str value. I've try to do more and check conversion like in this code

foreach (mb_list_encodings() as $chr) {
    foreach (mb_list_encodings() as $chr2) {
        $test = mb_convert_encoding($decoded, $chr, $chr2);
    }
}

and I've finally found some values looks similiar but not equal. I've do the same with the oryginal $str but also without success (I didn't get the request output from the question).

foreach (mb_list_encodings() as $chr) {
    foreach (mb_list_encodings() as $chr2) {
        //try with and without urlencode
        $test = urlencode(mb_convert_encoding($str, $chr, $chr2));
    }
}

Ofcourse when we do this

$newOutput = urlencode($decoded);

then we get the $out value.

The conclusion is that the conversion between ASCII and UTF-8 is obviously unnecessary in this case and the input string may be wrong (mayby because of some unnecessary confersion from UTF-8 to something I can't recognize).

Upvotes: 0

Alcinator
Alcinator

Reputation: 317

%D8 is not ascii encoding. Ascii has 127 (or 255 if you're using extended) characters (see http://www.asciitable.com/)

As such, special characters like Ø have no equivalent. mb_convert_encoding handles this by replacing them with a ?, whereas iconv throws an error.

The output you're after looks more like url encoding. Try this:

echo urlencode("اوقات-شرعی-جمعه-8-مرداد-ماه-به-اÙÙ‚-اردبیل");

Upvotes: 4

Related Questions