eleonora
eleonora

Reputation: 95

Different behaviour when using constant string versus reading data from a file

I'm having trouble reading a UTF-8 file and detecting some non-ASCII characters like 'á' or 'ö'. If I simply declare a constant string with the UTF-8 characters, everything works fine, the error only occurs on file contents.

My input file looks like:

áéíóúöüőűÁÉÍÓÚÖÜŐŰäÄß

My perl program looks:

use utf8;    
binmode STDOUT, ":utf8";    
my $szo = "áéíóúöüőűÁÉÍÓÚÖÜŐŰäÄß";    
list($szo);

while(<STDIN>){
chomp;
list($_);
}

sub list($){
  my ($szo) = @_;

  my @arr = split(//, $szo);
  foreach(@arr){
     my $ord = ord($_);
     if($_ eq 'á'){print "á\n";}
     print "isoe elem:$_ ord:$ord \n"; 
  }
}

The execution results in the following output:

á  
isoe elem:á ord:225   
isoe elem:é ord:233   
isoe elem:í ord:237   
isoe elem:ó ord:243   
....    
isoe elem:ß ord:223   
From here file data, no good:  
isoe elem:Ã ord:195   
isoe elem:¡ ord:161   
isoe elem:Ã ord:195   
isoe elem:© ord:169   
...    
isoe elem:Ã ord:195   
isoe elem: ord:159

Upvotes: 2

Views: 125

Answers (1)

pwes
pwes

Reputation: 2040

You need to specify UTF-8 encoding also for STDIN, just as you did for STDOUT.

Upvotes: 8

Related Questions