Suhas V
Suhas V

Reputation: 21

Is there any python code to merge 2 xml tags present in same file?

<?xml version="1.0"?>
<document>
    <page id='1'>
        <Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
    </page>
    <page id='1'>
        <Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
    </page>
    <page id='2'>
        <Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
    </page>
</document>

Expected Output:

<?xml version="1.0"?>
<document>
    <page>
        <Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
    </page>
    <page id='2'>
        <Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
    </page>
</document>

I tried to do with lxml.etree.parse but this is helping me get the sub-elements and the attribute values. I want to merge the tags with same id. This just an example.

Thank You in advance

Upvotes: 0

Views: 170

Answers (1)

balderman
balderman

Reputation: 23815

See below (the idea is to collect all sub elements having the same page id and recreate the pages)

import xml.etree.ElementTree as ET
from collections import defaultdict

XML = '''<?xml version="1.0"?>
<document>
    <page id='1'>
        <Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
    </page>
    <page id='1'>
        <Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
    </page>
    <page id='2'>
        <Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
        <Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
    </page>
</document>'''

root = ET.fromstring(XML)
ids_to_foto = defaultdict(list)
pages = root.findall('.//page')
for page in pages:
    ids_to_foto[page.attrib['id']].extend(list(page))
    root.remove(page)
# recreate the pages
for _id, fotos in ids_to_foto.items():
    el = ET.SubElement(root, 'page', attrib={'id': _id})
    for f in fotos:
        el.append(f)
ET.dump(root)

output

<?xml version="1.0" encoding="UTF-8"?>
<document>
   <page id="1">
      <Foto ID="1" type="jpeg" r="0" g="0" b="0" />
      <Foto ID="2" type="jpeg" r="0" g="0" b="0" />
      <Foto ID="3" type="jpeg" r="0" g="0" b="0" />
      <Foto ID="1" type="jpeg" r="0" g="0" b="0" />
      <Foto ID="2" type="jpeg" r="0" g="0" b="0" />
      <Foto ID="3" type="jpeg" r="0" g="0" b="0" />
   </page>
   <page id="2">
      <Foto ID="1" type="jpeg" r="0" g="0" b="0" />
      <Foto ID="2" type="jpeg" r="0" g="0" b="0" />
      <Foto ID="3" type="jpeg" r="0" g="0" b="0" />
   </page>
</document>

Upvotes: 1

Related Questions