Sinan Ünür
Sinan Ünür

Reputation: 118128

In Perl, how can I can check if an encoding specified in a string is valid?

Say, I have a sub that receives two arguments: An encoding specification, and a file path. The sub then uses that information to open a file for reading as shown below, stripped down to its essentials:

run({
    encoding => 'UTF-16---LE',
    input_filename => 'test_file.txt',
});

sub run {
    my $args = shift;
    my ($enc, $fn) = @{ $args }{qw(encoding input_filename)};

    my $is_ok = open my $in,
        sprintf('<:encoding(%s)', $args->{encoding}),
        $args->{input_filename}
    ;
}

Now, this croaks with:

Cannot find encoding "UTF-16---LE" at E:\Home\...

What is the right way to ensure that $args->{encoding} holds a valid encoding specification before interpolating into the second argument to open?

Update

The information below is provided in the hope that it will be useful to someone at some point. I am also going to file a bug report.

The documents for Encode::Alias do not mention find_alias at all. A casual look at the Encode/Alias.pm on my Windows system reveals:

# Public, encouraged API is exported by default

our @EXPORT =
  qw (
  define_alias
  find_alias
);

However, note:

#!/usr/bin/env perl

use 5.014;
use Encode::Alias;
say find_alias('UTF-8')->name;

yields:

Use of uninitialized value $find in exists at C:/opt/Perl/lib/Encode/Alias.pm line 25. Use of uninitialized value $find in hash element at C:/opt/Perl/lib/Encode/Alias.pm line 26. Use of uninitialized value $find in pattern match (m//) at C:/opt/Perl/lib/Encode/Alias.pm line 31. Use of uninitialized value $find in lc at C:/opt/Perl/lib/Encode/Alias.pm line 40. Use of uninitialized value $find in pattern match (m//) at C:/opt/Perl/lib/Encode/Alias.pm line 31. Use of uninitialized value $find in lc at C:/opt/Perl/lib/Encode/Alias.pm line 40.

Being 1) lazy, and 2) first to assume I am doing something wrong, I decided to seek others' wisdom.

In any case, the bug is due to find_alias being exported as a function without checking for that in the code:

sub find_alias {
    require Encode;
    my $class = shift;
    my $find  = shift;
    unless ( exists $Alias{$find} ) {

If find_alias is not invoked as a method, the argument is now in $class and $find is undefined.

HTH.

Upvotes: 12

Views: 1608

Answers (2)

cjm
cjm

Reputation: 62099

You can use the find_encoding function in Encode. Although, if you want to use it as an :encoding layer, you should also check perlio_ok. It's possible (but rare) for an encoding to exist but not support use with :encoding:

use Carp qw(croak);
use Encode qw(find_encoding);

sub run {
    my $args = shift;
    my $enc = find_encoding($args->{encoding}) 
      or croak "$args->{encoding} is not a valid encoding";
    $enc->perlio_ok or croak "$args->{encoding} does not support PerlIO";

    my $is_ok = open my $in,
        sprintf('<:encoding(%s)', $enc->name),
        $args->{input_filename}
    ;
}

Note: find_encoding does handle aliases defined by Encode::Alias.

If you don't care about distinguishing between nonexistent encodings and those that don't support :encoding, you can just use the perlio_ok function:

Encode::perlio_ok($args->{encoding}) or croak "$args->{encoding} not supported";

Upvotes: 4

daxim
daxim

Reputation: 39158

Encode::Alias->find_alias($encoding_name) returns an object whose name attribute is the canonical encoding name on success, and false on failure.

$ Encode::Alias->find_alias('UTF-16---LE')
$ Encode::Alias->find_alias('UTF-16 LE')
Encode::Unicode  {
    Parents       Encode::Encoding
    Linear @ISA   Encode::Unicode, Encode::Encoding
    public methods (6) : bootstrap, decode, decode_xs, encode, encode_xs, renew
    private methods (0)
    internals: {
        endian   "v",
        Name   "UTF-16LE",
        size   2,
        ucs2   ""
    }
}
$ Encode::Alias->find_alias('Latin9')
Encode::XS  {
    public methods (9) : cat_decode, decode, encode, mime_name, name, needs_lines, perlio_ok, renew, renewed
    private methods (0)
    internals: 140076283926592
}
$ Encode::Alias->find_alias('UTF-16 LE')->name
UTF-16LE
$ Encode::Alias->find_alias('Latin9')->name
iso-8859-15

Upvotes: 5

Related Questions