Reputation: 752
I am writing a program to fix mangled encoding, specifically latin1(iso-8859-1
) to greek (iso-8859-7
).
I created a function that works as intended; a variable with badly encoded text is converted properly.
When I try to convert $ARGV[0]
with this function it doesn't seem to correctly interpret the input.
Here is a test program to demonstrate the issue:
#!/usr/bin/env perl
use 5.018;
use utf8;
use strict;
use open qw(:std :encoding(utf-8));
use Encode qw(encode decode);
sub unmangle {
my $input = shift;
print $input . "\n";
print decode('iso-8859-7', encode('latin1',$input)) . "\n";
}
my $test = "ÁöéÝñùìá"; # should be Αφιέρωμα
say "fix variable:";
unmangle($test);
say "\nfix argument:";
unmangle($ARGV[0]);
When I run this program with the same input as my $test
variable the reults are not the same (as I expected that they should be):
$ ./fix_bad_encoding.pl "ÁöéÝñùìá"
fix variable:
ÁöéÝñùìá
Αφιέρωμα
fix stdin:
ÃöéÃñùìá
ΓΓΆΓ©ΓñùìÑ
How do I get $ARGV[0]
to behave the way the $test
variable does?
Upvotes: 2
Views: 335
Reputation: 385647
You decoded the source. You decoded STDIN (which you don't use), STDOUT and STDERR. But not @ARGV
.
$_ = decode("UTF-8", $_) for @ARGV;
Upvotes: 2
Reputation: 241828
-CA
tells Perl the arguments are UTF-8 encoded. You can decode the argument from UTF-8 yourself:
unmangle(decode('UTF-8', $ARGV[0]));
Also, it's not "stdin" (that would be reading from *STDIN
), but "argument".
Upvotes: 1