Reputation: 249
I've the below XML file.
<root>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>
<catalog>
<book id="bk102">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>45.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>
<catalog>
<book id="bk103">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>46.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>
</root>
I want to create another XML by eliminating the tag. So, my new XML will look like -
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>
<catalog>
<book id="bk102">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>45.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>
<catalog>
<book id="bk103">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>46.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>
Below is my code and I'm able to generate byte class by eliminating the and keeping all the necessary row tags. but finally not able to convert my byte class to an xml format and getting the below error :
xml.etree.ElementTree.ParseError: junk after document element: line 11, column 0
Can you please assist?
import xml.etree.ElementTree as ET
base_tree = ET.parse('input.xml')
catalog = list(base_tree.getroot())
elemList = []
for elem in catalog:
getele = ET.tostring(elem, 'utf-8')
elemList.append(getele)
byt = b''.join(elemList)
print(byt)
mytree = ET.ElementTree(ET.fromstring(byt))
dis = str(ET.tostring(mytree.getroot()), 'utf-8')
Upvotes: 1
Views: 967
Reputation: 347
You can use list for this.
with open('input.xml') as input_file:
text = input_file.read()
catalog = list(ET.fromstring(text))[0]
ET.tostring(catalog, encoding='utf8', method='xml')
Though resulting string will not be a valid XML.
Upvotes: 2
Reputation: 114
root element is mandatory for being XML.
For just text processing maybe we could just do
import re
pattern = re.compile("<[/]{0,1}root>")
removed = re.sub(pattern, '', "<root>something</root>");
print(removed)
?
Upvotes: 0