Felipe Dourado
Felipe Dourado

Reputation: 441

Reading XML file with Python

I'm trying to remove the line below but without any chance. This is the file:

    <?xml version="1.0" encoding="UTF-8"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:x14="http://schemas.microsoft.com/office/spreadsheetml/2009/9/main" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xdr="http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" mc:Ignorable="x14ac xr xr2 xr3" xr:uid="{00000000-0001-0000-0100-000000000000}">
   <sheetPr codeName="Plan8">
      <tabColor rgb="FF7030A0" />
   </sheetPr>
   <dimension ref="A1:R38" />
   <sheetViews>
      <sheetView showGridLines="0" zoomScale="80" zoomScaleNormal="80" workbookViewId="0">
         <selection activeCell="R23" sqref="R23" />
      </sheetView>
   </sheetViews>
   <sheetFormatPr defaultColWidth="0" defaultRowHeight="12.75" customHeight="1" zeroHeight="1" x14ac:dyDescent="0.25" />
   <cols>
      <col min="1" max="19" width="9.1796875" style="2370" customWidth="1" />
      <col min="20" max="16384" width="0" style="2370" hidden="1" />
   </cols>
   <sheetData>
   </sheetData>
   <sheetProtection algorithmName="SHA-512" hashValue="9BXdsbRh4b8QbB+BS1MENieL+OkGdyWxZI57tDcwtuD7pQU0BgA+xvV6/QshPlsdJr3NhlJY/dRwURG72EZLEg==" saltValue="aumEam/6ty7Q76KhNVUg2g==" spinCount="100000" sheet="1" objects="1" scenarios="1" />
   <pageMargins left="0.511811024" right="0.511811024" top="0.78740157499999996" bottom="0.78740157499999996" header="0.31496062000000002" footer="0.31496062000000002" />
   <pageSetup paperSize="9" orientation="portrait" r:id="rId1" />
   <drawing r:id="rId2" />
   <legacyDrawing r:id="rId3" />
   <oleObjects>
   </oleObjects>
</worksheet>

My code:

        tree = etree.parse(f'{newTempFolder}/xl/worksheets/sheet2.xml')
        print(f'My tree: {tree}')

        for elem in tree.xpath('.//worksheet//sheetProtection'):
            print('Here we go...')
            parent = elem.getparent()
            parent.remove(elem)

There are no errors but I see the code is not getting into the for-loop.

Upvotes: 2

Views: 325

Answers (1)

jarmod
jarmod

Reputation: 78860

I think that you need to supply namespaces on the xpath call.

Here's an example:

from lxml import etree

tree = etree.parse("example.xml")

namespaces = {
    "t": "http://schemas.openxmlformats.org/spreadsheetml/2006/main"
}

for elem in tree.xpath("/t:worksheet", namespaces=namespaces):
    print("worksheet:", elem)

for elem in tree.xpath("/t:worksheet/t:sheetProtection", namespaces=namespaces):
    print("sheetProtection:", elem)
    parent = elem.getparent()
    print("parent:", parent)
    # parent.remove(elem)

If you then want to persist your XML changes back to file, then:

  • use tree.write('output.xml'), or
  • see this answer which stringizes the tree and writes that to disk

Upvotes: 1

Related Questions