Reputation: 38748
I need to read a file encoded in iso-8859-1.
For some reason I can't get the encoding layer (as described in PerlIO::encoding
) to work. Here's a minimal example of what I am doing.
test.txt
contains a single pound sign encoded in iso-8859-1.
% iconv -f iso-8859-1 test.txt
£
% hexdump -C test.txt
00000000 a3 0a |..|
00000002
My Perl script:
#!/bin/perl
use warnings;
use strict;
open my $f, "<:encoding(iso-8859-1)", $ARGV[0] or die qq{Could not open $ARGV[0]: $!};
while (<$f>) {
print;
}
Result:
% ./script.pl test.txt | hexdump -C
00000000 a3 0a |..|
00000002
So the script prints the exact byte sequence it reads, with no conversion performed.
Upvotes: 4
Views: 319
Reputation: 385917
A string is a sequence of (32-bit or 64-bit) numbers.
In a string containing decoded text, those numbers are Unicode Code Points. Since byte A3
represents Unicode Code Point U+00A3
under iso-8859-1, decode("iso-8859-1", "\xA3")
therefore returns "\xA3"
.
You proceeded to print that string, and print("\xA3")
on a file handle with no encoding layers produces the byte A3
(since it expects a strings of bytes).
You didn't specify what you wanted to do, but I'm guessing you wanted the program to produce convert the input from iso-8859-1 to UTF-8. To achieve that,
Add
use open ':std', ':encoding(locale)';
or
use open ':std', ':encoding(UTF-8)';
These add an encoding layer to STDIN, STDOUT and STDERR (using binmode
), and they set the default encoding layer for open
in scope.
Upvotes: 4
Reputation: 38748
I was assuming that file handles not declared with a specific encoding use the utf-8 encoding by default, but apparently that isn't true.
Adding an explicit
binmode(STDOUT, ":utf8");
fixes the problem.
Upvotes: 5