PHP mb_detect_encoding no longer reliably detects UTF-8

Question

I recently switched from PHP 7 to PHP 8.2.7 and noticed that mb_detect_encoding seems to no longer work reliably. I am doing the following:

mb_detect_encoding(file_get_contents('somefile.csv'), 'UTF-8, ISO-8859-1', true);

For one particular file, the value returned is ISO-8859-1, even though it clearly is a UTF-8 encoded file. It even has a UTF-8 BOM. I understand that the detection algorithm changed in PHP 8, but how can a clear UTF-8 encoded file be mistaken for ISO? I get that most UTF-8 encoded strings are also valid ISO strings, but what good is mb_detect_encoding if it fails to see the difference? BTW. the file in question is 1759 Bytes long and has around 30 UTF-8 two-byte characters, so IMO this should be plenty to detect it as UTF-8.

I cannot upload files, but this is the first line of the .CSV file:

Buchungstag;Wertstellung;Umsatzart;Buchungstext;Betrag;Währung;Auftraggeberkonto;Bankleitzahl Auftraggeberkonto;IBAN Auftraggeberkonto

PHP mb_detect_encoding no longer reliably detects UTF-8

Answers (1)

Related Questions