Reputation: 95
I'm having trouble reading a UTF-8 file and detecting some non-ASCII characters like 'á' or 'ö'. If I simply declare a constant string with the UTF-8 characters, everything works fine, the error only occurs on file contents.
My input file looks like:
áéíóúöüőűÁÉÍÓÚÖÜŐŰäÄß
My perl program looks:
use utf8;
binmode STDOUT, ":utf8";
my $szo = "áéíóúöüőűÁÉÍÓÚÖÜŐŰäÄß";
list($szo);
while(<STDIN>){
chomp;
list($_);
}
sub list($){
my ($szo) = @_;
my @arr = split(//, $szo);
foreach(@arr){
my $ord = ord($_);
if($_ eq 'á'){print "á\n";}
print "isoe elem:$_ ord:$ord \n";
}
}
The execution results in the following output:
á
isoe elem:á ord:225
isoe elem:é ord:233
isoe elem:í ord:237
isoe elem:ó ord:243
....
isoe elem:ß ord:223
From here file data, no good:
isoe elem:Ã ord:195
isoe elem:¡ ord:161
isoe elem:Ã ord:195
isoe elem:© ord:169
...
isoe elem:Ã ord:195
isoe elem: ord:159
Upvotes: 2
Views: 125
Reputation: 2040
You need to specify UTF-8 encoding also for STDIN
, just as you did for STDOUT
.
Upvotes: 8