user4960633
user4960633

Reputation:

Reading UTF8 files with File::Slurp

I try to read an HTML file with the Perl module File::Slurp:

binmode STDOUT, ':utf8';
my $htmlcontent = read_file($file, {binmode => ':utf8'});

But when I print the $htmlcontent variable, some characters are not understood, due to French accents or special characters.

For example : "Plus d'actualit\u00e9s" should be "Plus d'actualités"

I also checked the encoding of the file and it's ok!

HTML document, UTF-8 Unicode text, with very long lines, with CRLF, LF line terminators

Is there a problem with this module?

Thanks

Upvotes: 2

Views: 670

Answers (1)

Denis Ibaev
Denis Ibaev

Reputation: 2520

\u00e9 is not an UTF-8 character, is JavaScript represent of Unicode character. You need decode content of file with Encode::JavaScript::UCS for example.

Upvotes: 2

Related Questions