Whitehot
Whitehot

Reputation: 497

Perl Text::Unidecode not producing correct output

I am attempting to use Text::Unidecode to transform all accented characters (é,ç,è,à, etc) in a text file into their non accented counterparts (e, c, e, a, in this case). The function unidecode() should do just that, but I am getting strange output...

Each accented character seems to be replaced by 'A' followed by one or two other characters. For example, the input "éèçàöôäüû" produces the output "A(c)A"ASSA APA'A$?A1/4A>>".

The function works fine if I use it on some user defined string in a script, but not when I use it in a while loop, like this:

#!/bin/usr/perl
use utf8;
use Text::Unidecode;
while(<>){
    print(unidecode($_));
}
#end

The problem persists with or without use utf8;, but could the text file format still cause issues? Is this a known issue with the module?

Upvotes: 2

Views: 171

Answers (1)

choroba
choroba

Reputation: 242218

use utf8 tells Perl what encoding you use in the source code. To set the encoding of the input, use

use open IN => ':encoding(UTF-8)', ':std';

Or, if you're not reading from a file, set the encoding of the *STDIN handle:

binmode *STDIN, ':encoding(UTF-8)';

See open and binmode.

Upvotes: 4

Related Questions