bagere
bagere

Reputation: 266

Parsing xml with lxml in python 3

I have this code, i want to group animals with the same tags to the one group ex. tags<dog><dog> to <dogs><dog/><dog/></dogs> etc. But in my code, I have no idea why is the output without animals.

OUTPUT:

<root>
       <zoo>
           <some_tag/><some_diff/>
       </zoo>
       <zoo>
           <b/><o/>
       </zoo>
 </root>

CODE:

 xml = '`<root>
                  <zoo>
                      <some_tag/><some_diff/>
                      <dog/><dog/>
                      <cat/><cat/><cat/>
                  </zoo>
                  <zoo>
                      <b/><o/>
                      <dog/><dog/>
                      <cat/><cat/><cat/><cat/>
                  </zoo>
            </root>`'

from lxml import etree as et
root = et.fromstring(xml)
node = root.findall('./zoo')
j = False
k = False
for zoo in node:
    for animal in zoo:
        if 'dog' in animal.tag:
            if not j:
                dogs = et.SubElement(zoo,'dogs')
            dogs.append(animal)
            j = True
        if 'cat' in animal.tag:                        
            if not k:
                cats = et.SubElement(zoo,'cats')            
            cats.append(animal)
            k = True

    k = False
    j= False  

Upvotes: 1

Views: 978

Answers (1)

securecurve
securecurve

Reputation: 5807

I made some modifications to your script, it works for me .. check it out:

xml = '''<root>
                  <zoo>
                      <some_tag/>
                      <some_diff/>
                      <dog/>
                      <dog/>
                      <cat/>
                      <cat/>
                      <cat/>
                  </zoo>

                  <zoo>
                      <b/>
                      <o/>
                      <dog/>
                      <dog/>
                      <cat></cat>
                      <cat></cat>
                  </zoo>
            </root>'''

from lxml import etree as et


root = et.fromstring(xml)

# The below 3 lines have the same effect, use whichever you like
node = root.findall('./zoo')
node = list( root.getchildren() )
node = root.getchildren()


dogs_flag = False
cats_flag = False

for zoo in node:

    # Resetting the flags in each iteration, otherwise, you will 
    # have all the cats and dogs inside one zoo element ... try it yourself
    dogs_flag = False
    cats_flag = False

    for animal in zoo:

        if 'dog' == animal.tag:
            if not dogs_flag:
                dogs = et.SubElement(zoo,'dogs')
                dogs_flag = True    # I think this is a better place to set your flag                

            dogs.append(animal)


        if 'cat' == animal.tag:                        
            if not cats_flag:
                cats = et.SubElement(zoo,'cats')            
                cats_flag = True

            cats.append(animal)


print et.tostring(root, pretty_print = True)  

This will give you this output

        <root>
              <zoo>
                  <some_tag/>
                  <some_diff/>
                  <dogs>
                      <dog/>
                      <dog/>
                  </dogs>
                  <cats>
                      <cat/>
                      <cat/>
                      <cat/>
                  </cats>
              </zoo>

              <zoo>
                  <b/>
                  <o/>
                  <dogs>
                      <dog/>
                      <dog/>
                  </dogs>
                  <cats>
                      <cat/>
                      <cat/>
                  </cats>
              </zoo>
        </root>

Upvotes: 1

Related Questions