warezsoftwarez
warezsoftwarez

Reputation: 57

how to retrieve specific tag information using ElementTree?

I would like to retrieve specific tag attribute. file tag contains child tag filename and basing on this field I would like to decide if the modification should be taken.

In other words: if filename value contains .tar I would like to print modification time.

In example below I'd expect that 2020-07-15T06:41:12.000Z would be printed.

I was trying to do this for 2 hours but I did not succed, so I'll be really thankful for any tips bringing me closer to the solution. Here is the code, but nothing is printed nor added to dates list:

import xml.etree.ElementTree as ET

tree = ET.parse(r"C:\path\to\file\logs.xml")
root = tree.getroot()
dates = []
for filetag in root.findall('.//{*}file'):
    for filename in filetag.findall('../{*}filename'):
        if ".tar" in filename.attrib['value']:
            print(filename)
            dates.append(filename)

Here is XML document:

<?xml version="1.0" encoding="UTF-8"?>
<session xmlns="http://winscp.net/schema/session/1.0" name="[email protected]" start="2020-07-22T10:01:12.939Z">
  <ls>
    <destination value="/folder/processing" />
    <files>
      <file>
        <filename value="." />
        <type value="d" />
        <modification value="2020-07-22T08:57:28.000Z" />
        <permissions value="rwxrwsrwx" />
        <owner value="1000130000" />
        <group value="0" />
      </file>
      <file>
        <filename value=".." />
        <type value="d" />
        <modification value="2020-07-22T08:51:15.000Z" />
        <permissions value="rwxrwxrwx" />
        <owner value="1000130000" />
        <group value="0" />
      </file>
      <file>
        <filename value="package_tsp200715092001_20200715074120.tar" />
        <type value="-" />
        <size value="4014536192" />
        <modification value="2020-07-15T06:41:12.000Z" />
        <permissions value="rw-rw-rw-" />
        <owner value="1005" />
        <group value="1005" />
      </file>
      <file>
        <filename value="package_tsp200715092001_20200715074120" />
        <type value="d" />
        <modification value="2020-07-15T06:41:59.000Z" />
        <permissions value="rwxr-Sr--" />
        <owner value="1000130000" />
        <group value="0" />
      </file>
    </files>
    <result success="true" />
  </ls>
</session>

Upvotes: 1

Views: 566

Answers (2)

balderman
balderman

Reputation: 23815

Below is a one liner:

import xml.etree.ElementTree as ET

xml = '''

<session xmlns="http://winscp.net/schema/session/1.0" name="[email protected]" start="2020-07-22T10:01:12.939Z">
  <ls>
    <destination value="/folder/processing" />
    <files>
      <file>
        <filename value="." />
        <type value="d" />
        <modification value="2020-07-22T08:57:28.000Z" />
        <permissions value="rwxrwsrwx" />
        <owner value="1000130000" />
        <group value="0" />
      </file>
      <file>
        <filename value=".." />
        <type value="d" />
        <modification value="2020-07-22T08:51:15.000Z" />
        <permissions value="rwxrwxrwx" />
        <owner value="1000130000" />
        <group value="0" />
      </file>
      <file>
        <filename value="package_tsp200715092001_20200715074120.tar" />
        <type value="-" />
        <size value="4014536192" />
        <modification value="2020-07-15T06:41:12.000Z" />
        <permissions value="rw-rw-rw-" />
        <owner value="1005" />
        <group value="1005" />
      </file>
      <file>
        <filename value="package_tsp200715092001_20200715074120" />
        <type value="d" />
        <modification value="2020-07-15T06:41:59.000Z" />
        <permissions value="rwxr-Sr--" />
        <owner value="1000130000" />
        <group value="0" />
      </file>
    </files>
    <result success="true" />
  </ls>
</session>
'''

NS = {'scp': 'http://winscp.net/schema/session/1.0'}
root = ET.fromstring(xml)
tar_files_dates = [f.find('./scp:modification',NS).attrib['value'] for f in root.findall('.//scp:file',NS) if '.tar' in f.find('./scp:filename',NS).attrib['value']]
print(tar_files_dates)

output

['2020-07-15T06:41:12.000Z']

Upvotes: 1

Masklinn
Masklinn

Reputation: 42342

for filename in filetag.findall('../{*}filename'):

because of the .. this looks for a filename in the parent of the file element (that is, as a sibling of file). It should be a single .

Furthermore, namespace wildcards were added in Python 3.8. You don't indicate which Python version you're using, so this may also be an issue.

Anyway you're probably better off "properly" using namespaces instead of looking for shortcuts, it's a bit more verbose but hardly difficult:

NS = {'scp': 'http://winscp.net/schema/session/1.0'}
for filetag in root.findall('.//scp:file', NS):
    for filename in filetag.findall('./scp:filename', NS):
        if ".tar" in filename.get('value', ''):
            print(filename)
            dates.append(filename)

Upvotes: 2

Related Questions