Nabarun Chakraborti
Nabarun Chakraborti

Reputation: 249

How to remove Root tag and keep rest all row tags in an xml using python

I've the below XML file.

<root>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>
<catalog>
   <book id="bk102">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>45.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>
<catalog>
   <book id="bk103">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>46.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>
</root>

I want to create another XML by eliminating the tag. So, my new XML will look like -

<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>
<catalog>
   <book id="bk102">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>45.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>
<catalog>
   <book id="bk103">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>46.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>

Below is my code and I'm able to generate byte class by eliminating the and keeping all the necessary row tags. but finally not able to convert my byte class to an xml format and getting the below error :

xml.etree.ElementTree.ParseError: junk after document element: line 11, column 0

Can you please assist?

import xml.etree.ElementTree as ET

base_tree = ET.parse('input.xml')
catalog = list(base_tree.getroot())
elemList = []
for elem in catalog:
  getele = ET.tostring(elem, 'utf-8')
  elemList.append(getele)

byt = b''.join(elemList)
print(byt)

mytree = ET.ElementTree(ET.fromstring(byt))
dis = str(ET.tostring(mytree.getroot()), 'utf-8')

Upvotes: 1

Views: 967

Answers (2)

shoonya ek
shoonya ek

Reputation: 347

You can use list for this.

with open('input.xml') as input_file:
    text = input_file.read()
    catalog = list(ET.fromstring(text))[0]
    ET.tostring(catalog, encoding='utf8', method='xml')

Though resulting string will not be a valid XML.

Upvotes: 2

supl
supl

Reputation: 114

root element is mandatory for being XML.

For just text processing maybe we could just do

import re
pattern = re.compile("<[/]{0,1}root>")
removed = re.sub(pattern, '', "<root>something</root>");

print(removed)

?

Upvotes: 0

Related Questions