Escape extra quotes in malformed xml

Question

I've malformed xml file that contains extra quotes in a tag. I would like to remove them or replace by "e. Malformed XML looks looks like:

My expected result:

or

I've tried to iterate through all lines and finding ATT first and last indexes but it's quite dirty and produces too much code.

Do anyone have simple solution for this?

Tim Pietzcker · Accepted Answer

This is not 100% foolproof, but might work with a little luck:

re.sub(r'(?)', '"', malformed_xml)

will only replace quotes that are neither preceded by = nor followed by >.

If there could be whitespace after = (or before >), you can't use the re module anymore, but the regex module (PyPI) can work with this:

regex.sub(r'(?)', '"', malformed_xml)

Answers (2)