Gulbahar
Gulbahar

Reputation: 5537

How to remove all occurences of element in XML file?

I'd like to edit a KML file and remove all occurences of ExtendedData elements, wherever they are located in the file.

Here's the input XML file:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>

  <Style id="placemark-red">
    <IconStyle>
      <Icon>
        <href>http://maps.me/placemarks/placemark-red.png</href>
      </Icon>
    </IconStyle>
  </Style>

  <name>My track</name>

  <ExtendedData xmlns:mwm="https://maps.me">
    <mwm:name>
      <mwm:lang code="default">Blah</mwm:lang>
    </mwm:name>
    <mwm:lastModified>2020-04-05T14:17:18Z</mwm:lastModified>
  </ExtendedData>

  <Placemark>
    <name></name>
        …
    <ExtendedData xmlns:mwm="https://maps.me">
      <mwm:localId>0</mwm:localId>
      <mwm:visibility>1</mwm:visibility>
    </ExtendedData>
  </Placemark>
</Document>
</kml>

And here's the code that 1) only removes the outermost occurence, and 2) requires adding the namespace to find it:

from lxml import etree
from pykml import parser
from pykml.factory import KML_ElementMaker as KML

with open("input.xml") as f:
  doc = parser.parse(f)
root = doc.getroot()

ns = "{http://earth.google.com/kml/2.2}"

for pm in root.Document.getchildren():
    #No way to get rid of namespace, for easier search?
    if pm.tag==f"{ns}ExtendedData":
        root.Document.remove(pm)

    #How to remove innermost occurence of ExtendedData?

print(etree.tostring(doc, pretty_print=True))

Is there a way to remove all occurences in one go, or should I parse the whole tree?

Thank you.


Edit: The BeautifulSoup solution below requires adding an option "BeautifulSoup(my_xml,features="lxml")" to avoid the warning "No parser was explicitly specified".

Upvotes: 0

Views: 419

Answers (3)

Parfait
Parfait

Reputation: 107687

Simply run the empty template with Identity Transform using XSLT 1.0 which Python's lxml can run. No for/while loops or if logic needed. To handle the default namespace, define a prefix like doc:

XSLT (save a .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                              xmlns:doc="http://earth.google.com/kml/2.2">
    <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <!-- IDENTITY TRANSFORM -->
    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>

    <!-- REMOVE ALL OCCURRENCES OF NODE -->
    <xsl:template match="doc:ExtendedData"/>

</xsl:stylesheet>

Python

import lxml.etree as et

# LOAD XML AND XSL SOURCES
xml = et.parse('Input.xml')
xsl = et.parse('XSLT_Script.xsl')

# TRANSFORM INPUT
transform = et.XSLT(xsl)
result = transform(xml)

# PRINT TO SCREEN
print(result)

# SAVE TO FILE
with open('Output.kml', 'wb') as f:
    f.write(result)

Upvotes: 0

CristiC777
CristiC777

Reputation: 481

If you know the XML structure, try:

xml_root = ElementTree.parse(filename_path).getroot()
elem = xml_root.find('./ExtendedData')
xml_root.remove(elem)

or

xml_root = ElementTree.parse(filename_path).getroot()
p_elem = xml_root.find('/Placemark')
c_elem = xml_root.find('/Placemark/ExtendedData')
p_elem.remove(c_elem)

play with this ideas :)

if you don't know the xml structure, I think you need to parse the whole tree.

Upvotes: 0

Roy2012
Roy2012

Reputation: 12523

Here's a solution using BeautifulSoup:

soup = BeautifulSoup(my_xml) # this is your xml

while True: 
    elem = soup.find("extendeddata")
    if not elem:
        break
    elem.decompose()

Here's the output for your data:

<?xml version="1.0" encoding="UTF-8"?>
<html>
 <body>
  <kml xmlns="http://earth.google.com/kml/2.2">
   <document>
    <style id="placemark-red">
     <IconStyle>
      <Icon>
        <href>http://maps.me/placemarks/placemark-red.png</href>
      </Icon>
    </IconStyle>
    </style>
    <name>
     My track
    </name>
    <placemark>
     <name>
     </name>
    </placemark>
   </document>
  </kml>
 </body>
</html>

Upvotes: 1

Related Questions