Matthieu Veron
Matthieu Veron

Reputation: 290

Compare two xml files with good performances using python

I would like to compare 2 xml and get a specific output, using Python.

example:

old.xml

<foos>
    <foo>
        <id>1</id>
        <x>1</x>
    </foo>
    <foo>
        <id>2</id>
        <x>1</x>
    </foo>
    <foo>
        <id>3</id>
        <x>1</x>
        <y>1</y>
    </foo>
</foo>

new.xml

<foos>
    <foo>
        <id>1</id>
        <x>2</x>
        <y>1</y>
    </foo>
    <foo>
        <id>2</id>
        <x>1</x>
    </foo>
    <foo>
        <id>3</id>
        <x>2</x>
        <y>1</y>
    </foo>
    <foo>
        <id>4</id>
        <x>1</x>
    </foo>
</foo>

And the output I want:

output.xml

<foos>
    <foo>
        <id>1</id>
        <x>2</x>
        <y>1</y>
    </foo>
    <foo>
        <id>3</id>
        <x>2</x>
    </foo>
    <foo>
        <id>4</id>
        <x>1</x>
    </foo>
</foo>

I wrote a very ugly function with poor performances, and I would like to find a better way to do that. Do you have any ideas of how to perform this task with good performances ?

Some issues I had ;

Upvotes: 1

Views: 205

Answers (1)

yazz
yazz

Reputation: 331

Maybe this is also an ugly method, for your information.

import io
from simplified_scrapy import SimplifiedDoc, utils

def getChange(oldFile='old.xml', newFile='new.xml'):
    xmlOld = utils.getFileContent(oldFile)
    docOld = SimplifiedDoc(xmlOld)
    foo = docOld.selects('foo')
    dic = {}
    for f in foo:
        dic[f.id.text] = (f.x.text, f.y.text)

    xmlNew = utils.getFileContent(newFile)
    docNew = SimplifiedDoc(xmlNew)
    foo = docNew.selects('foo')
    change = {}
    for f in foo:
        old = dic.get(f.id.text)
        if not old:
            change[f.id.text] = (f.x.text, f.y.text)
        else:
            new = (f.x.text, f.y.text)
            if old[0] != new[0] and old[1] != new[1]:
                change[f.id.text] = (f.x.text, f.y.text)
            elif old[0] != new[0]:
                change[f.id.text] = (f.x.text, '')
            elif old[1] != new[1]:
                change[f.id.text] = ('', f.y.text)
    return change


def saveFile(change, output='output.xml'):
    with io.open(output, mode='w') as file:
        file.write(u'<foos>\n')
        for k, v in change.items():
            file.write('<foo><id>{}</id>'.format(k))
            if v[0]:
                file.write('<x>{}</x>'.format(v[0]))
            if v[1]:
                file.write('<y>{}</y>'.format(v[1]))
            file.write('</foo>\n')
        file.write('</foos>\n')


saveFile(getChange())

Upvotes: 1

Related Questions