mamesaye
mamesaye

Reputation: 2153

encode special character in html entities in perl

I have a string where special characters like ! or " or & or # or @, ... can appear. How can I convert in the string

str = " Hello "XYZ" this 'is' a test & so *n @."

automatically every special characters with their html entities, so that I get this:

str = " Hello &quot ;XYZ&quot ; this &#39 ;is&#39 ; a test &amp ; so on @" 

I tried

$str=HTML::Entities::encode_entities($str);

but it does a partial work the @ is not transformed in &#64 ;

SOLUTION:

1) with your help (Quentin and vol7ron) I came up with this solution(1)

$HTML::Entities::char2entity{'@'} = '@';
$HTML::Entities::char2entity{'!'} = '!';
$HTML::Entities::char2entity{'#'} = '#';
$HTML::Entities::char2entity{'%'} = '%';
$HTML::Entities::char2entity{'.'} = '.';
$HTML::Entities::char2entity{'*'} = '*';
$str=HTML::Entities::encode_entities($str, q{@"%'.&#*$^!});

2) and I found a shorter(better) solution(2) found it here:

$str=HTML::Entities::encode_entities($str, '\W');

the '\W' does the job

@von7ron with solution(1) you will need to specify the characters you want to translate as Quentin mentioned earlier even if they are on the translation table.

Upvotes: 4

Views: 9116

Answers (3)

vol7ron
vol7ron

Reputation: 42099

You can manually add a character to the translation table (char2entity hash).

$HTML::Entities::char2entity{'@'} = '@';

my $str      =  q{ Hello "XYZ" this 'is' a test & so on @};
my $encoded  =  HTML::Entities::encode_entities( $str, q{<>&"'@} );
  1. The above adds @, which will be translated to &#64;.
  2. You then need to specify the characters you want to translate, if you don't it uses <>&", so I added both @ and '. Notice, I didn't have to add the ' to the translation table, because it's already there by default.
  3. You don't need to add ASCII characters (0-255) to the char2entity hash, since the module will do it automatically.

Note: Setting the char2entity for @, was done as an example. The module automatically sets numerical entities for ASCII characters (0-255) that weren't found. You'd have to use it for unicode characters, though.

Upvotes: 2

Jarmund
Jarmund

Reputation: 3205

Cheap, dirty, and ugly, but works:

my %translations;
$translations{'"'}  = '&quot ;';
$translations{'\''} = '&#39 ;';
etc...


sub transform()
{
    my $str = shift;
    foreach my $character (keys(%translations))
    {
        $str =~ s/$character/$translations{$character}/g;
    }
    return $str;
}

Upvotes: -1

Quentin
Quentin

Reputation: 943537

@ isn't transformed because it isn't considered to be a "special character". It can be represented in ASCII and has no significant meaning in HTML.

You can expand the range of characters that are converted with the second argument to the function you are using, as described in the documentation.

Upvotes: 5

Related Questions