Reputation: 23759
My Perl program is processing an XML file. Some entries may contain & symbols. And the parser breaks, saying: "Invalid name in entity".
How can I process the file and encode &-s in all the incorrect entities?
So, it will be something like this:
<words>text1 & text2</words> --> <words>text1 & text2</words>
Upvotes: 2
Views: 407
Reputation: 163655
It's not an XML file. If it were XML, the &
would be written as &
. Processing non-XML files is difficult because you can't use an XML parser. It's best to fix the program that created this file, changing it to produce proper well-formed XML.
Upvotes: 6
Reputation: 1429
It's tricky, non-trivial, and usually involves tradeoffs. When I encountered a similar problem, replacing &
characters followed by either an uppercase character or whitespace (/\&[A-Z ]/
in regexp) with &
(and the "trailing character") solved most cases -- and it's usually good enough since you're already going the extra mile by accepting not well-formed XML input.
Upvotes: 3