Michael
Michael

Reputation: 8526

Perl's IO::File and use open qw(:utf8)

IO::File->open() doesn't seem to respect use open() in the following program, which is odd to me and seems to be against the documentation. Or maybe I'm doing it wrong. Rewriting my code to not use IO::File shouldn't be difficult.

I expect the output to be

$VAR1 = \"Hello \x{213} (r-caret)";

Hello ȓ (r-caret)
Hello ȓ (r-caret)
Hello ȓ (r-caret)

But I'm getting this error: "Oops: Malformed UTF-8 character (unexpected end of string) in print at ./run.pl line 33."

That doesn't seem right to me at all.

#!/usr/local/bin/perl

use utf8;
use v5.16;
use strict;
use warnings;
use warnings qw(FATAL utf8);
use diagnostics;
use open qw(:std :utf8);
use charnames qw(:full :short);

use File::Basename;
my $application = basename $0;

use Data::Dumper;
$Data::Dumper::Indent = 1;

use Try::Tiny;

my $str = "Hello ȓ (r-caret)";

say Dumper(\$str);

open(my $fh, '<', \$str);
print while ($_ = $fh->getc());
close($fh);
print "\n";

try {
  use IO::File;
  my $fh = IO::File->new();
  $fh->open(\$str, '<');
  print while ($_ = $fh->getc());
  $fh->close();
  print "\n";
}
catch {
  say "\nOops: $_";
};

try {
  use IO::File;
  my $fh = IO::File->new();
  $fh->open(\$str, '<:encoding(UTF-8)');
  print while ($_ = $fh->getc());
  $fh->close();
  print "\n";
}
catch {
  say "\nOops: $_";
};

Upvotes: 2

Views: 3053

Answers (2)

ikegami
ikegami

Reputation: 385764

[This is not an answer, but a notification of a bug that doesn't fit in a comment.]

Files can only contain bytes. $str contains values that aren't bytes. Therefore,

open(my $fh, '<', \$str)

makes no sense. It should be

open(my $fh, '<', \encode_utf8($str))

use utf8;
use v5.16;
use strict;
use warnings;
use warnings qw(FATAL utf8);
use open qw( :std :utf8 );
use Encode qw( encode_utf8 );
use Data::Dumper qw( Dumper );

sub dump_str {
   local $Data::Dumper::Useqq = 1;
   local $Data::Dumper::Terse = 1;
   local $Data::Dumper::Indent = 0;
   return Dumper($_[0]);
}

for my $encode (0..1) {
   for my $orig ("\x{213}", "\x{C9}", substr("\x{C9}\x{213}", 0, 1)) {
      my $file_ref = $encode ? \encode_utf8($orig) : \$orig;
      my $got = eval { open(my $fh, '<', $file_ref); <$fh> };
      printf("%-10s  %-6s  %-9s => %-10s => %s\n",
         $encode ? "bytes" : "codepoints",
         defined($got) && $orig eq $got ? "ok" : "not ok",
         dump_str($orig),
         dump_str($$file_ref),
         defined($got) ? dump_str($got) : 'DIED',
      );
   }
}

Output:

codepoints  ok      "\x{213}" => "\x{213}"  => "\x{213}"
codepoints  not ok  "\311"    => "\311"     => DIED
codepoints  not ok  "\x{c9}"  => "\x{c9}"   => DIED
bytes       ok      "\x{213}" => "\310\223" => "\x{213}"
bytes       ok      "\311"    => "\303\211" => "\x{c9}"
bytes       ok      "\x{c9}"  => "\303\211" => "\x{c9}"

Upvotes: 3

Schwern
Schwern

Reputation: 164809

I believe what's happening here is use open is a lexical pragma meaning it only affects calls to open() in the same lexical scope. Lexical scope is when the code is in the same block. IO::File->open is a wrapper around open() and so is calling open() outside its lexical scope.

{
    use open;

    ...same lexical scope...

    {
        ...inner lexical scope...
        ...inherits from the outer...
    }

    ...still the same lexical scope...
    foo();
}

sub foo {
    ...outside "use open"'s lexical scope...
}

In the example above, even though foo() is called inside use open's lexical scope, the code inside foo() is outside and thus not under its effect.

It would be polite if IO::File inherited open.pm. This is not trivial but possible. A similar problem plagued autodie. It was fixed and the fix could probably work in IO::File.

Upvotes: 6

Related Questions