Merge child nodes with the similar parent node, xml, python

Question

I have the following xml file:


    09/09/2013
    1
        aaa1
        1aaaaaaa
    
    0
        aaa2
        2aaaaaaa
    
    1
        aaa3
        3aaaaaaa
    
    0
        aaa4
        4aaaaaaa
    
    1
        aaa5
        5aaaaaaa

I would like to transform it to the following file:


    09/09/2013
    1
        aaa1+aaa3+aaa5
        1aaaaaaa+3aaaaaaa+5aaaaaaa
    
    0
        aaa2+aaa4
        2aaaaaaa+4aaaaaaa

How can I do it in python?

My approach to do this task is the following: 1) loop through tags 2) form dictionary key- either 0 or 1, value - 3) for each element in this dictionary find all child nodes: and and append them

Since that, I wrote the following code to implement this (ps I am currently struggling with adding elements to the dictionary, but I will overcome this issue):

def parse():
list_of_inique_timestamps=[]
text_to_merge=""
tree=et.parse("~/Documents/test1.xml")
root=tree.getroot()
for children in root:
    print children.tag, children.text
    for child in children:
        print (child.tag,int(child.text))
        if not child.text in list_of_inique_timestamps:
            list_of_inique_timestamps.append(child.text)
print list_of_inique_timestamps

alecxe · Accepted Answer

Here's the solution using xml.etree.ElementTree from python standard library.

The idea is to gather items into defaultdict(list) per article_time text value:

from collections import defaultdict
import xml.etree.ElementTree as ET

data = """
    09/09/2013
    1
        aaa1
        1aaaaaaa
    
    0
        aaa2
        2aaaaaaa
    
    1
        aaa3
        3aaaaaaa
    
    0
        aaa4
        4aaaaaaa
    
    1
        aaa5
        5aaaaaaa
    
    

"""

tree = ET.fromstring(data)

root = ET.Element('root')
article_date = ET.SubElement(root, 'article_date')
article_date.text = tree.find('.//article_date').text

data = defaultdict(list)
for article_time in tree.findall('.//article_time'):
    text = article_time.text.strip()
    name = article_time.find('./article_name').text
    link = article_time.find('./article_link').text
    data[text].append((name, link))

for time_value, items in data.iteritems():
    article_time = ET.SubElement(article_date, 'article_time')
    article_name = ET.SubElement(article_time, 'article_name')
    article_link = ET.SubElement(article_time, 'article_name')

    article_time.text = time_value
    article_name.text = '+'.join(name for (name, _) in items)
    article_link.text = '+'.join(link for (_, link) in items)

print ET.tostring(root)

prints (prettified):


    09/09/2013
        1
            aaa1+aaa3+aaa5
            1aaaaaaa+3aaaaaaa+5aaaaaaa
        
        0
            aaa2+aaa4
            2aaaaaaa+4aaaaaaa

See, the result is exactly what you were aiming to.

Merge child nodes with the similar parent node, xml, python

Answers (2)

Related Questions