Reputation: 5542
I have an xml file and a python script is used for adding a new node to that xml file.I used xml.dom.minidom module for processing the xml file.My xml file after processing with the python module is given below
<?xml version="1.0" ?><Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PostBuildEvent>
<Command>xcopy "SourceLoc" "DestLoc"</Command>
</PostBuildEvent>
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
<Import Project="project.targets"/></Project>
What i actually needed is as given below .The changes are a newline character after the first line and before the last line and also '"' is converted to "
<?xml version="1.0" ?>
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PostBuildEvent>
<Command>xcopy "SourceLoc" "DestLoc"</Command>
</PostBuildEvent>
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
<Import Project="project.targets"/>
</Project>
The python code i used is given below
xmltree=xml.dom.minidom.parse(xmlFile)
for Import in Project.getElementsByTagName("Import"):
newImport = xml.dom.minidom.Element("Import")
newImport.setAttribute("Project", "project.targets")
vcxprojxmltree.writexml(open(VcxProjFile, 'w'))
What should i update in my code to get the xml in correct format
Thanks,
Upvotes: 2
Views: 865
Reputation: 1011
From docs of minidom:
Node.toprettyxml([indent=""[, newl=""[, encoding=""]]])
Return a pretty-printed version of the document. indent specifies the indentation string and defaults to a tabulator; newl specifies the string emitted at the end of each line and defaults to \n.
That's all customisation you get from minidom.
Tried inserting a Text node as a root sibling for newline. Hope dies last. I recommend using regular expressions from re module and inserting it manually.
As for removing SGML entities, there's apparently an undocumented function for that in python standard library:
import HTMLParser
h = HTMLParser.HTMLParser()
unicode_string = h.unescape(string_with_entities)
Alternatively, you can do this manually, again using re, as all named entity names and corresponding codepoints are inside the htmlentitydefs
module.
Upvotes: 1