Manu
Manu

Reputation: 77

How to compare two xml files in python in script?

I am new python. I have some predefined xml files. I have a script which generate new xml files. I want to write an automated script which compares xmls files and stores the name of differing xml file names in output file? Thanks in advance

Upvotes: 0

Views: 13963

Answers (4)

Pankaj Raheja
Pankaj Raheja

Reputation: 1

What about the following snippet :

def separator(self):
    return "!@#$%^&*" # Very ugly separator

def _traverseXML(self, xmlElem, tags, xpaths):
    tags.append(xmlElem.tag)
    for e in xmlElem:
        self._traverseXML(e, tags, xpaths)

    text = ''
    if (xmlElem.text):
        text = xmlElem.text.strip()

    xpaths.add("/".join(tags) + self.separator() + text)
    tags.pop()

def _xmlToSet(self, xml):
    xpaths = set() # output
    tags = list()
    root = ET.fromstring(xml)
    self._traverseXML(root, tags, xpaths)

    return xpaths

def _areXMLsAlike(self, xml1, xml2):
    xpaths1 = self._xmlToSet(xml1)
    xpaths2 = self._xmlToSet(xml2)

    return xpaths1 == xpaths2

Upvotes: 0

janbrohl
janbrohl

Reputation: 2656

Do you speak of comparing them byte-wise or for semantic equality? (Is <tag attr1="1" attr2="2" /> equal to <tag attr2="2" attr1="1" />?) If you want to check for semantic equality have a look at Xml comparison in Python

When generating xml especially if using normal dicts for the attributes somewhere attribute order can be mixed up even sometimes when you use the same script with the same input.

items()

...

CPython implementation detail: Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions.

Upvotes: 2

Brionius
Brionius

Reputation: 14118

Building on @Xaranke's answer:

import filecmp

out_file = open("diff_xml_names.txt")
# Not sure what format your filenames will come in, but here's one possibility.
filePairs = [('f1a.xml', 'f1b.xml'), ('f2a.xml', 'f2b.xml'), ('f3a.xml', 'f3b.xml')]

for f1, f2 in filePairs:
    if not filecmp.cmp(f1, f2):
        # Files are not equal
        out_file.write(f1+'\n')

out_file.close()

Upvotes: 0

user2286078
user2286078

Reputation:

I think you're looking for the filecmp module. You can use it like this:

import filecmp
cmp = filecmp.cmp('f1.xml', 'f2.xml')

# Files are equal
if cmp:
    continue
else:
    out_file.write('f1.xml') 

Replace f1.xml and f2.xml with your xml files.

Upvotes: 2

Related Questions