Reputation:
I try to read an HTML file with the Perl module File::Slurp:
binmode STDOUT, ':utf8';
my $htmlcontent = read_file($file, {binmode => ':utf8'});
But when I print the $htmlcontent
variable, some characters are not understood, due to French accents or special characters.
For example : "Plus d'actualit\u00e9s"
should be "Plus d'actualités"
I also checked the encoding of the file and it's ok!
HTML document, UTF-8 Unicode text, with very long lines, with CRLF, LF line terminators
Is there a problem with this module?
Thanks
Upvotes: 2
Views: 670
Reputation: 2520
\u00e9
is not an UTF-8 character, is JavaScript represent of Unicode character. You need decode content of file with Encode::JavaScript::UCS for example.
Upvotes: 2