tacoscool
tacoscool

Reputation: 93

How can I escape text for an XML document in Perl?

Anyone know of any Perl module to escape text in an XML document?

I'm generating XML which will contain text that was entered by the user. I want to correctly handle the text so that the resulting XML is well formed.

Upvotes: 3

Views: 21235

Answers (9)

Raman
Raman

Reputation: 19585

For programs that need to handle every special case, by all means use an official library for this task. However, theoretically there are only 5 characters that need escaping in XML.

So, for one-offs that you don't want to pull in an extra library for, the following perl expression should suffice:

perl -pe 's/\&/\&amp;/g; s/</\&lt;/g; s/>/\&gt;/g; s/"/\&quot;/g; s/'"'"'/\&apos;/g'

Upvotes: 0

Wadester
Wadester

Reputation: 343

The XML::Simple escape_value could be used also, but use of XML::Simple is not recommended for new programs. See this post post 17436965.

A manual escape could be done using regex (copied from escape_value):

$data =~ s/&/&amp;/sg;
$data =~ s/</&lt;/sg;
$data =~ s/>/&gt;/sg;
$data =~ s/"/&quot;/sg;

Upvotes: 11

zakovyrya
zakovyrya

Reputation: 9689

I personally prefer XML::LibXML - Perl binding for libxml. One of the pros - it uses one of the fastest XML processing library available. Here is an example for creating text node:

use XML::LibXML;
my $doc = XML::LibXML::Document->new('1.0',$some_encoding);
my $element = $doc->createElement($name);
$element->appendText($text);
$xml_fragment = $element->toString();
$xml_document = $doc->toString();

And, never, ever create XML by hand. It's gonna be bad for your health when people find out what you did.

Upvotes: 9

Jakob
Jakob

Reputation: 3662

Although you better use a module like XML::LibXML or XML::Code you could wrap textual data in a CDATA section. You must only take care not to put ]]> in it (this sequence is also disallowed outside of CDATA sections!):

$text =~ s/\]\]>/]]>]]&gt;<![CDATA[/;
$text = "<![CDATA[$text]]>";
$xml = "<foo>$text</foo>"; 

As bonus your code will look more perlish obfuscated! :-)

Upvotes: 0

muenalan
muenalan

Reputation: 619

Use

XML::Generator

require XML::Generator;

my $xml = XML::Generator->new( ':pretty', escape => 'always,apos' );

print $xml->h1( " &< >non-html plain text< >&" );

which will print all content inside the tags escaped (no conflicts with the markup).

Upvotes: 3

tacoscool
tacoscool

Reputation: 93

After checking out XML::Code as recommended by Krish I found that this can be done using the XML::Code text() function. E.g.,

use XML::Code;
my $text = new XML::Code('=');
$text->set_text(q{> & < " ' "});
print $text->code(); # prints &gt; &lt; &amp; " ' "

Passing '=' creates a text node which when printed doesn't contain tags. Note: this only works for text data. It wont correctly escape attributes.

Upvotes: 2

hovenko
hovenko

Reputation: 713

XML::Entities:

use XML::Entities;
my $a_encoded = XML::Entities::numify('all', $a);

Edit: XML::Entities only numifies HTML entities. Use HTML::Entities encode_entities($a) instead

Upvotes: 3

Sinan &#220;n&#252;r
Sinan &#220;n&#252;r

Reputation: 118128

I am not sure why you need to escape text that is in an XML file. If your file contains:

<foo>x < y</foo>

The file is not an XML file despite the proliferation of angle brackets. An XML file must contain valid data meaning something like this:

<foo>x &lt; y</foo>

or

<foo><![CDATA[x < y]]></foo>

Therefore, either:

  1. You are not asking for escaping data in an XML file. Rather, you want to figure out how to put character data in an XML file so the resulting file is valid XML; or

  2. You have some data in an XML file that needs to be escaped for some other reason.

Care to elaborate?

Upvotes: 8

joe
joe

Reputation: 35077

Use XML::Code.

From CPAN

XML::code escape()

Normally any content of the node will be escaped during rendering (i. e. special symbols like '&' will be replaced by corresponding entities). Call escape() with zero argument to prevent it:

        my $p = XML::Code->('p');
        $p->set_text ("&#8212;");
        $p->escape (0);
        print $p->code(); # prints <p>&#8212;</p>
        $p->escape (1);
        print $p->code(); # prints <p>&amp;#8212;</p>

Upvotes: 6

Related Questions