user4035
user4035

Reputation: 23759

Encode & symbol in XML

My Perl program is processing an XML file. Some entries may contain & symbols. And the parser breaks, saying: "Invalid name in entity".

How can I process the file and encode &-s in all the incorrect entities?

So, it will be something like this:

<words>text1 & text2</words>  -->  <words>text1 &amp; text2</words>

Upvotes: 2

Views: 407

Answers (2)

Michael Kay
Michael Kay

Reputation: 163655

It's not an XML file. If it were XML, the & would be written as &amp;. Processing non-XML files is difficult because you can't use an XML parser. It's best to fix the program that created this file, changing it to produce proper well-formed XML.

Upvotes: 6

dnet
dnet

Reputation: 1429

It's tricky, non-trivial, and usually involves tradeoffs. When I encountered a similar problem, replacing & characters followed by either an uppercase character or whitespace (/\&[A-Z ]/ in regexp) with &amp; (and the "trailing character") solved most cases -- and it's usually good enough since you're already going the extra mile by accepting not well-formed XML input.

Upvotes: 3

Related Questions