Reputation: 21
<?xml version="1.0"?>
<document>
<page id='1'>
<Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
</page>
<page id='1'>
<Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
</page>
<page id='2'>
<Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
</page>
</document>
Expected Output:
<?xml version="1.0"?>
<document>
<page>
<Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
</page>
<page id='2'>
<Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
</page>
</document>
I tried to do with lxml.etree.parse but this is helping me get the sub-elements and the attribute values. I want to merge the tags with same id. This just an example.
Thank You in advance
Upvotes: 0
Views: 170
Reputation: 23815
See below (the idea is to collect all sub elements having the same page id and recreate the pages)
import xml.etree.ElementTree as ET
from collections import defaultdict
XML = '''<?xml version="1.0"?>
<document>
<page id='1'>
<Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
</page>
<page id='1'>
<Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
</page>
<page id='2'>
<Foto ID="1" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="2" type="jpeg" r="0" g="0" b="0"/>
<Foto ID="3" type="jpeg" r="0" g="0" b="0"/>
</page>
</document>'''
root = ET.fromstring(XML)
ids_to_foto = defaultdict(list)
pages = root.findall('.//page')
for page in pages:
ids_to_foto[page.attrib['id']].extend(list(page))
root.remove(page)
# recreate the pages
for _id, fotos in ids_to_foto.items():
el = ET.SubElement(root, 'page', attrib={'id': _id})
for f in fotos:
el.append(f)
ET.dump(root)
output
<?xml version="1.0" encoding="UTF-8"?>
<document>
<page id="1">
<Foto ID="1" type="jpeg" r="0" g="0" b="0" />
<Foto ID="2" type="jpeg" r="0" g="0" b="0" />
<Foto ID="3" type="jpeg" r="0" g="0" b="0" />
<Foto ID="1" type="jpeg" r="0" g="0" b="0" />
<Foto ID="2" type="jpeg" r="0" g="0" b="0" />
<Foto ID="3" type="jpeg" r="0" g="0" b="0" />
</page>
<page id="2">
<Foto ID="1" type="jpeg" r="0" g="0" b="0" />
<Foto ID="2" type="jpeg" r="0" g="0" b="0" />
<Foto ID="3" type="jpeg" r="0" g="0" b="0" />
</page>
</document>
Upvotes: 1