user6089877
user6089877

Reputation:

Need help to parse an XML file

I'm trying to parse an XML file and I am blocked on something.

Take a quick look on my XML file :

<editrust>
  <flux ref='ITFR2006' sens='IN'>
    <intervalle ref='H10'>
      <terminé>1</terminé>
      <prisEnComtpe>1</prisEnComtpe>
    </intervalle>
    <intervalle ref='H60'>
      <terminé>11</terminé>
      <prisEnComtpe>11</prisEnComtpe>
    </intervalle>
    <intervalle ref='D1'>
      <terminé>150</terminé>
      <prisEnComtpe>150</prisEnComtpe>
    </intervalle>
    <intervalle ref='D2'>
      <terminé>150</terminé>
      <prisEnComtpe>150</prisEnComtpe>
    </intervalle>
  </flux>

  <flux ref='ITFR2007_2021' sens='IN'>
    <intervalle ref='H10'>
      <terminé>2</terminé>
      <prisEnComtpe>2</prisEnComtpe>
    </intervalle>
    <intervalle ref='H60'>
      <terminé>181</terminé>
      <prisEnComtpe>121</prisEnComtpe>
    </intervalle>
    <intervalle ref='D1'>
      <terminé>600</terminé>
      <prisEnComtpe>600</prisEnComtpe>
    </intervalle>
    <intervalle ref='D2'>
      <terminé>600</terminé>
      <prisEnComtpe>600</prisEnComtpe>
    </intervalle>
  </flux>
...

I want to render something like a dictionary list

{'ITFR2006': ['IN', 'H10', '1','1', 'H60', '11', '11', 'D1', '150', '150'],...

I did a script :

import xml.etree.ElementTree as etree
tree = etree.parse('fichier.xml')
root = tree.getroot()

flux = {}

def findText(node):

    for child in node:

        if child.attrib.get("ref"):

            if "ITFR" in child.attrib.get("ref"):
                itfr = child.attrib.get("ref")
                flux[itfr] = []

                print("\n-----------------\n")

            print(child.attrib.get("ref"))

        if child.attrib.get("sens"):
            flux[itfr].append(child.attrib.get("sens"))
            print(child.attrib.get("sens"))

        if child.text.strip():

            print(child.text.strip())

        findText(child)


findText(root)

print(flux)

The script has this render :

-----------------

ITFR2006
IN
H10
1
1
H60
11
11
D1
150
150
D2
150
150

-----------------

ITFR2007_2021
IN
H10
2
2
H60
181
121
D1
600
600
D2
600
600
....

So, the print(flux) makes:

{'ITFR2006': ['IN'], 'ITFR2007_2021': ['IN'], 'ITFR2008': ['IN'], 'ITFR2011_2020': ['IN'], 'ITFR2012': ['OUT'], 'ITFR2013': ['OUT'], 'ITFR2014': ['OUT'], 'ITFR2017': ['OUT'], 'ITFR2018': ['OUT'], 'ITFR2019': ['OUT'], 'ITFR2023': ['OUT'], 'ITFR2024': ['OUT']}

This is a good begining I think but I can't fill my list with the others values ('H10', '1', '1', 'H60', ...)

Any idea to finish my work ?

Thanks

Upvotes: 1

Views: 73

Answers (1)

mzjn
mzjn

Reputation: 51012

Here is a way to do it (tested with Python 3.6):

import xml.etree.ElementTree as etree
import pprint

tree = etree.parse('fichier.xml')
fluxdict = {}

for flux in tree.findall("flux"):
    # The key
    key = flux.get("ref")
    # Add first item to the list
    val = [flux.get("sens")]

    for intervalle in flux.findall("intervalle"):
        ref = intervalle.get("ref")
        termine = intervalle.findtext("terminé")
        prisEnComtpe = intervalle.findtext("prisEnComtpe")

        # Add items by extending list
        val.extend([ref, termine, prisEnComtpe])

    # Add key:val pair for this 'flux'
    fluxdict[key] = val

pprint.pprint(fluxdict)

Output:

{'ITFR2006': ['IN',
              'H10',
              '1',
              '1',
              'H60',
              '11',
              '11',
              'D1',
              '150',
              '150',
              'D2',
              '150',
              '150'],
 'ITFR2007_2021': ['IN',
                   'H10',
                   '2',
                   '2',
                   'H60',
                   '181',
                   '121',
                   'D1',
                   '600',
                   '600',
                   'D2',
                   '600',
                   '600']}

Upvotes: 1

Related Questions