Reputation: 5317
use open qw( :encoding(UTF-8) :std );
Above statement seems to be effective in its lexical scope only and should not affect outside of it's scope. But I have observed the following.
$ cat data
€
#1
$ perl -e '
open (my $fh, "<encoding(UTF-8)", "data");
print($_) while <$fh>;'
Wide character in print at -e line 1, <$fh> line 1.
€
The Wide character ...
warning is perfect here. But
$ perl
my ($fh, $row);
{
use open qw( :encoding(UTF-8) :std );
open ($fh, "<", "data");
}
$row = <$fh>;
chomp($row);
printf("%s (0x%X)", $row, ord($row));
€ (0x20AC)
Does not show the wide character warning!! Here is whats going on here imo
Now look at the following, a little variation
#3my ($fh, $row);
{
use open qw( :encoding(UTF-8) :std );
}
open ($fh, "<", "data");
$row = <$fh>;
chomp($row);
printf("%s (0x%X)", $row, ord($row));
⬠(0xE2)
Now this time since the open statement is out of the lexical scope, the open
opened the file in non utf-8 mode.
Does this mean use open qw( :encoding(UTF-8) :std );
statement changes the STDOUT globally but STDIN within lexical scope?
Upvotes: 1
Views: 246
Reputation: 40778
Unfortunately, the open qw(:std)
pragma does not seem to behave as a lexical pragma since it changes the IO layers associated with the standard handles STDIN
, STDOUT
and STDERR
globally. Even code earlier in source file is affected since the use
statement happens at compile time. So the following
say join ":", PerlIO::get_layers(\*STDIN);
{
use open qw( :encoding(UTF-8) :std );
}
prints ( on my linux platform ) :
unix:perlio:encoding(utf-8-strict):utf8
whereas without the use open qw( :encoding(UTF-8) :std )
it would just print
unix:perlio
.
A way to not affect the global STDOUT
for example is to duplicate the handle within a lexical scope and then add IO layers to the duplicate handle within that scope:
use feature qw(say);
use strict;
use warnings;
use utf8;
my $str = "€";
say join ":", PerlIO::get_layers(\*STDOUT);
{
open ( my $out, '>&STDOUT' ) or die "Could not duplicate stdout: $!";
binmode $out, ':encoding(UTF-8)';
say $out $str;
}
say join ":", PerlIO::get_layers(\*STDOUT);
say $str;
with output:
unix:perlio
€
unix:perlio
Wide character in say at ./p.pl line 16.
€
Upvotes: 1
Reputation: 132905
You aren't using STDIN
. You're opening a file with an explicit encoding (except for your last example) and reading from that.
The use open qw(:std ...)
affects the standard file handles, but you're only using standard output. When you don't use that and print UTF-8 data to standard output, you get the warning.
In your last example, you don't read the data with an explicit encoding, so when you print it to standard output, it's already corrupted.
That's the trick of encodings no matter what they are. Every part of the process has to be correct.
If you want use open
to affect all file handles, you have to import it differently. There are several examples in the top of the documentation.
Upvotes: 3