Leo Galleguillos
Leo Galleguillos

Reputation: 2730

Why do I get garbled output when I decode some HTML entities but not others?

In Perl, I am trying to decode strings which contain numeric HTML entities using HTML::Entities. Some entities work, while "newer" entities don't. For example:

decode_entities('®');  # returns ® as expected
decode_entities('Ω'); # returns Ω instead of Ω
decode_entities('★'); # returns ★ instead of ★

Is there a way to decode these "newer" HTML entities in Perl? In PHP, the html_entity_decode function seems to decode all of these entities without any problem.

Upvotes: 1

Views: 187

Answers (2)

ikegami
ikegami

Reputation: 385764

The decoding works fine. It's how you're outputting them that's wrong. For example, you may have sent the strings to a terminal without encoding them for that terminal first. This is achieved through the open pragma in the following program:

$ perl -e'
    use open ":std", ":encoding(UTF-8)";
    use HTML::Entities qw( decode_entities );
    CORE::say decode_entities($_)
       for "®", "Ω", "★";
'
®
Ω
★

Upvotes: 5

reflective_mind
reflective_mind

Reputation: 1515

Make sure your terminal can handle UTF-8 encoding. It looks like it's having problems with multibyte characters. You can also try to set UTF-8 for STDOUT in case you get wide character warnings.

use strict;
use warnings;
use HTML::Entities;

binmode STDOUT, ':encoding(UTF-8)';

print decode_entities('®');  # returns ®
print decode_entities('Ω'); # returns Ω
print decode_entities('★'); # returns ★

This gives me the correct/expected results.

Upvotes: 1

Related Questions