Reputation: 1647
I have an XML that does not include the encoding (charset / Character encoding / character set / character map / codeset / code page). This is an example for one that does:
<?xml version="1.0" encoding="UTF-8"?>
The XML is being generated by a Perl script and the following is an excerpt:
$fileName = $exportDirectory . $fileName;
open FILE, ">$fileName" or die;
The questions:
I tried to use LibXML:
perl -MXML::LibXML -e 'XML::LibXML->load_xml(location => "2.xml")' 2.xml:1364531: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xBF 0x30 0x39 0x20 female presented in spring �09 due t ^
I hope I supplied sufficient information. Please let me know if further information is needed.
Upvotes: 0
Views: 214
Reputation: 9421
You may have to compile enca yourself. As for chardet, there's a chance your repo contains a packaged script.
Enca works only for European languages and requires you to tell it which language the file is in. Chardet does worse in differentiating European languages encoded with 8-bit encodings, but performs better with non-European text.
Upvotes: 1