Doug Lerner
Doug Lerner

Reputation: 1543

How to strip invalid XML 1.0 characters from a JavaScript string?

Does anybody know of a JavaScript function anywhere which takes a string and returns it stripped of invalid XML 1.0 characters?

I'm trying to create valid XML 1.0 from content extracted from a database containing utf-8 data, but some of the data contains invalid characters so the xml I create won't validate.

The language used for accessing the data and creating the xml is server-side JavaScript.

Upvotes: 1

Views: 1647

Answers (1)

Doug Lerner
Doug Lerner

Reputation: 1543

I found a way of stripping out at least those characters which were causing the XML 1.0 to be invalid. It looks rather like a kludge, and I'm sure there must be a better way of doing it, and it looks somewhat repetitive with the last line. But it works.

If I have more time, or somebody has a better answer, please let me know. Thanks.

str = str.replace(/\u00B7/g,'');
str = str.replace(/\u00C2/g,'');
str = str.replace(/\u00A0/g,'');
str = str.replace(/\u00A2/g,'');
str = str.replace(/\u00A3/g,'');
str = str.replace(/[^\u000D\u00B7\u0020-\u007E\u00A2-\u00A4]/g,'');

Upvotes: 2

Related Questions