Roman Cheplyaka
Roman Cheplyaka

Reputation: 38748

Why doesn't Perl's encoding layer have any effect?

I need to read a file encoded in iso-8859-1.

For some reason I can't get the encoding layer (as described in PerlIO::encoding) to work. Here's a minimal example of what I am doing.

test.txt contains a single pound sign encoded in iso-8859-1.

% iconv -f iso-8859-1 test.txt
£

% hexdump -C test.txt
00000000  a3 0a                                             |..|
00000002

My Perl script:

#!/bin/perl

use warnings;
use strict;

open my $f, "<:encoding(iso-8859-1)", $ARGV[0] or die qq{Could not open $ARGV[0]: $!};

while (<$f>) {
  print;
}

Result:

% ./script.pl test.txt | hexdump -C
00000000  a3 0a                                             |..|
00000002

So the script prints the exact byte sequence it reads, with no conversion performed.

Upvotes: 4

Views: 319

Answers (2)

ikegami
ikegami

Reputation: 385917

A string is a sequence of (32-bit or 64-bit) numbers.

In a string containing decoded text, those numbers are Unicode Code Points. Since byte A3 represents Unicode Code Point U+00A3 under iso-8859-1, decode("iso-8859-1", "\xA3") therefore returns "\xA3".

You proceeded to print that string, and print("\xA3") on a file handle with no encoding layers produces the byte A3 (since it expects a strings of bytes).


You didn't specify what you wanted to do, but I'm guessing you wanted the program to produce convert the input from iso-8859-1 to UTF-8. To achieve that,

Add

use open ':std', ':encoding(locale)';

or

use open ':std', ':encoding(UTF-8)';

These add an encoding layer to STDIN, STDOUT and STDERR (using binmode), and they set the default encoding layer for open in scope.

Upvotes: 4

Roman Cheplyaka
Roman Cheplyaka

Reputation: 38748

I was assuming that file handles not declared with a specific encoding use the utf-8 encoding by default, but apparently that isn't true.

Adding an explicit

binmode(STDOUT, ":utf8");

fixes the problem.

Upvotes: 5

Related Questions