Reputation: 359
I'm working on a CGI script that is called from a piece of software (that I can't change). The variables submitted by the software are giving me problems, because if they contain non-ascii characters they look like this:
ÿFFFFDEetta er texti meÿFFFFF0 ÿFFFFEDslenskum stÿFFFFF6fum
instead of
Þetta er texti með íslenskum stöfum
.
I've tried messing with the Encode::decode
function but nothing has come of it - all I've gotten it to do is change how the ÿ
gets represented.
So yeah, I'm kind of stumped. What do I do to change all the ÿFFFFDE
s into Þ
s and so on, without resorting to replacing each non-ascii character individually (which is not a solution because this needs to work for languages I don't even speak)?
Upvotes: 4
Views: 826
Reputation: 39158
use Encode qw(decode);
use Encode::Escape qw();
$_ = 'ÿFFFFDEetta er texti meÿFFFFF0 ÿFFFFEDslenskum stÿFFFFF6fum';
s/ÿFFFF/\\x/g;
decode('iso-8859-1', decode('unicode-escape', $_));
# returns 'Þetta er texti með íslenskum stöfum'
Upvotes: 7