Reputation: 1256
//reencoding string from UTF-8 to Latin1
echo mb_detect_encoding($out);
$out = mb_convert_encoding($out, mb_detect_encoding($out),"ISO-8859-1");
echo mb_detect_encoding($out);
die;
The result printed on my page is :
ASCIIASCII
I already checked the possible encoding supported (http://php.net/manual/fr/mbstring.supported-encodings.php) Latin-1 is knew as ISO-8859-1. But nothing changes...
---[EDIT]---
this is what I get when I print $out before the mb_detect_encoding(); My string is correct. Maybe it could be a php.ini configuration wich mess it up? I am not able to change it ...
EDI_DC40 0000000000027262 2 SALESORDER_CREATEFROMDAT201 SALESORDER_CREATEFROMDAT2 330SOL 96A ORDERSTDX4 LS SERVEURDPL SAPP48 LS SERVEURDPL 1 E2SALESORDER_CREATEFROMDAT2 X E2BPSDHD1000 00000000000272621 YPR 4803 330 0230 20151002 20151002Z300 7134012207 71 20151002 20151002 E2BPSDITM000 00000000000272622 1 L7820100 9 E2BPSDITM000 00000000000272623 2 L7820400 6 E2BPSDITM000 00000000000272624 3 L9188000 5 E2BPPARNR000 00000000000272625 AG0000510001 E2BPPARNR000 00000000000272626 WE0000510001 E2BPPARNR000 00000000000272627 LQ0000030590 E2BPPARNR000 00000000000272628 ZQ0000990238 E2BPSCHDL000 00000000000272629 1 9 E2BPSCHDL000 000000000002726210 2 6 E2BPSCHDL000 000000000002726211 3 5 E2BPSDTEXT000 000000000002726212 FR E2BPPAREX000 000000000002726213 BAPE_VBAK LX2 E2BPPAREX000 000000000002726214 BAPE_VBAKX X
ASCII ASCII
[EDIT2]
I still have some issues to get my file encoded in ISO-8851-1.
I juste added $out = utf8_decode($out);
before to generate my file :
$strFileWrite = fopen($filePath, "w");
$strWritableFile = fwrite($strFileWrite, $out);
fclose($strFileWrite);
When I add "Ô" at the end of the $out variable, the file is recognized in latin-1, and the ô is well printed. When I add it in the middle of my file, the document is recognized in utf8 and the character "ô" is poorly printed ( � )
Upvotes: 2
Views: 6233
Reputation: 522005
As written here:
Strings have no actual associated encoding, they're merely byte arrays.
mb_detect_encoding
doesn't tell you what encoding the string has, it merely tries to detect it. That means it takes a few guesses (your second argument) and tells you the first that is valid.
If your original string is ASCII, it's already also valid Latin-1, UTF-8 and a whole bunch of other encodings for that matter, which are all supersets of ASCII. Converting it won't actually change anything. mb_detect_encoding
preferably detects it as ASCII, since it's the first valid match, and it's as valid an answer as virtually anything else.
If you require Latin-1 and you want to confirm that your string is valid in the Latin-1 encoding, use mb_check_encoding($str, 'ISO-8859-1')
.
Maybe start reading here to understand more: What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text
Upvotes: 4
Reputation: 2487
At first please note that PHP strings do not have any special 'charset' property. Encoding detection is totally based on string's byte-by-byte analysis.
Your string contain only characters from ASCII scheme, therefore whatever encoding you choose it is always ASCII-compatible (and then detected as ASCII because of higher ASCII priority).
mb_detect_encoding
compares string bytes against each encoding specified as second argument (which defaults to mb_detect_encoding
) and returns first encoding that contains all bytes / characters found in string.
Few examples (I've shortened your string for readability):
$order = mb_detect_order();
$encoding = mb_detect_encoding('EDI_DC40 0000000000027262', $order, true);
var_dump($order);
// array(2) { [0]=>string(5) "ASCII", [1]=> string(5) "UTF-8" }
var_dump($encoding);
// string(5) "ASCII"
Now let's revert the order.
$order = [0 => 'UTF-8', 1 => 'ASCII'];
$encoding = mb_detect_encoding('EDI_DC40 0000000000027262', $order, true);
var_dump($order);
// array(2) { [0]=>string(5) "UTF-8", [1]=> string(5) "ASCII" }
var_dump($encoding);
// string(5) "UTF-8"
And now let's try to put some non-ascii character into your string. In this situation mb_detect_encoding will realize that this is not ASCII string and will check it against UTF-8.
$order = mb_detect_order();
$encoding = mb_detect_encoding('źEDI_DC40 0000000000027262', $order, true);
var_dump($order);
// array(2) { [0]=>string(5) "ASCII", [1]=> string(5) "UTF-8" }
var_dump($encoding);
// string(5) "UTF-8"
Because your string contains only ASCII-compatible characters you can safely display, save and edit it as ASCII, event if it comes from UTF-8 source.
Upvotes: 2