Gauthier
Gauthier

Reputation: 1256

php mb_convert_encoding function doesn't work from ASCII to latin-1

        //reencoding string from UTF-8 to Latin1
        echo mb_detect_encoding($out);
        $out = mb_convert_encoding($out, mb_detect_encoding($out),"ISO-8859-1");
        echo mb_detect_encoding($out);
        die;

The result printed on my page is :

ASCIIASCII

I already checked the possible encoding supported (http://php.net/manual/fr/mbstring.supported-encodings.php) Latin-1 is knew as ISO-8859-1. But nothing changes...

---[EDIT]---

this is what I get when I print $out before the mb_detect_encoding(); My string is correct. Maybe it could be a php.ini configuration wich mess it up? I am not able to change it ...

EDI_DC40 0000000000027262 2 SALESORDER_CREATEFROMDAT201 SALESORDER_CREATEFROMDAT2 330SOL 96A ORDERSTDX4 LS SERVEURDPL SAPP48 LS SERVEURDPL 1 E2SALESORDER_CREATEFROMDAT2 X E2BPSDHD1000 00000000000272621 YPR 4803 330 0230 20151002 20151002Z300 7134012207 71 20151002 20151002 E2BPSDITM000 00000000000272622 1 L7820100 9 E2BPSDITM000 00000000000272623 2 L7820400 6 E2BPSDITM000 00000000000272624 3 L9188000 5 E2BPPARNR000 00000000000272625 AG0000510001 E2BPPARNR000 00000000000272626 WE0000510001 E2BPPARNR000 00000000000272627 LQ0000030590 E2BPPARNR000 00000000000272628 ZQ0000990238 E2BPSCHDL000 00000000000272629 1 9 E2BPSCHDL000 000000000002726210 2 6 E2BPSCHDL000 000000000002726211 3 5 E2BPSDTEXT000 000000000002726212 FR E2BPPAREX000 000000000002726213 BAPE_VBAK LX2 E2BPPAREX000 000000000002726214 BAPE_VBAKX X

ASCII ASCII

[EDIT2]

I still have some issues to get my file encoded in ISO-8851-1. I juste added $out = utf8_decode($out); before to generate my file :

    $strFileWrite    =  fopen($filePath, "w");
    $strWritableFile =  fwrite($strFileWrite, $out);
    fclose($strFileWrite);

When I add "Ô" at the end of the $out variable, the file is recognized in latin-1, and the ô is well printed. When I add it in the middle of my file, the document is recognized in utf8 and the character "ô" is poorly printed ( � )

Upvotes: 2

Views: 6233

Answers (2)

deceze
deceze

Reputation: 522005

As written here:

Strings have no actual associated encoding, they're merely byte arrays. mb_detect_encoding doesn't tell you what encoding the string has, it merely tries to detect it. That means it takes a few guesses (your second argument) and tells you the first that is valid.

If your original string is ASCII, it's already also valid Latin-1, UTF-8 and a whole bunch of other encodings for that matter, which are all supersets of ASCII. Converting it won't actually change anything. mb_detect_encoding preferably detects it as ASCII, since it's the first valid match, and it's as valid an answer as virtually anything else.

If you require Latin-1 and you want to confirm that your string is valid in the Latin-1 encoding, use mb_check_encoding($str, 'ISO-8859-1').

Maybe start reading here to understand more: What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text

Upvotes: 4

ptkoz
ptkoz

Reputation: 2487

At first please note that PHP strings do not have any special 'charset' property. Encoding detection is totally based on string's byte-by-byte analysis.

Your string contain only characters from ASCII scheme, therefore whatever encoding you choose it is always ASCII-compatible (and then detected as ASCII because of higher ASCII priority).

mb_detect_encoding compares string bytes against each encoding specified as second argument (which defaults to mb_detect_encoding) and returns first encoding that contains all bytes / characters found in string.

Few examples (I've shortened your string for readability):

$order = mb_detect_order();
$encoding = mb_detect_encoding('EDI_DC40 0000000000027262', $order, true);

var_dump($order);
// array(2) { [0]=>string(5) "ASCII", [1]=> string(5) "UTF-8" }
var_dump($encoding);
// string(5) "ASCII"

Now let's revert the order.

$order = [0 => 'UTF-8', 1 => 'ASCII'];
$encoding = mb_detect_encoding('EDI_DC40 0000000000027262', $order, true);

var_dump($order);
// array(2) { [0]=>string(5) "UTF-8", [1]=> string(5) "ASCII" }
var_dump($encoding);
// string(5) "UTF-8"

And now let's try to put some non-ascii character into your string. In this situation mb_detect_encoding will realize that this is not ASCII string and will check it against UTF-8.

$order = mb_detect_order();
$encoding = mb_detect_encoding('źEDI_DC40 0000000000027262', $order, true);

var_dump($order);
// array(2) { [0]=>string(5) "ASCII", [1]=> string(5) "UTF-8" }
var_dump($encoding);
// string(5) "UTF-8"

Because your string contains only ASCII-compatible characters you can safely display, save and edit it as ASCII, event if it comes from UTF-8 source.

Upvotes: 2

Related Questions