Reputation: 497
I am attempting to use Text::Unidecode
to transform all accented characters (é,ç,è,à, etc) in a text file into their non accented counterparts (e, c, e, a, in this case). The function unidecode()
should do just that, but I am getting strange output...
Each accented character seems to be replaced by 'A' followed by one or two other characters. For example, the input "éèçàöôäüû"
produces the output "A(c)A"ASSA APA'A$?A1/4A>>"
.
The function works fine if I use it on some user defined string in a script, but not when I use it in a while loop, like this:
#!/bin/usr/perl
use utf8;
use Text::Unidecode;
while(<>){
print(unidecode($_));
}
#end
The problem persists with or without use utf8;
, but could the text file format still cause issues? Is this a known issue with the module?
Upvotes: 2
Views: 171
Reputation: 242218
use utf8
tells Perl what encoding you use in the source code. To set the encoding of the input, use
use open IN => ':encoding(UTF-8)', ':std';
Or, if you're not reading from a file, set the encoding of the *STDIN handle:
binmode *STDIN, ':encoding(UTF-8)';
Upvotes: 4