Radheya
Radheya

Reputation: 821

xml comparison using python confusion

I am trying to compare below given two xml formats in python and would like to have your inputs on my approach

File 1:

<p1:car>                           
    <p1:feature car="111" type="color">511</p1:feature>
    <p1:feature car="223" type="color">542</p1:feature>
    <p1:feature car="299" type="color">559</p1:feature>
    <p1:feature car="323" type="color">564</p1:feature>
    <p1:feature car="353" type="color">564</p1:feature>
    <p1:feature car="391" type="color">570</p1:feature>
    <p1:feature car="448" type="color">570</p1:feature>

    <p1:feature car="111" type="tires" unit="percent">511</p1:feature>
    <p1:feature car="223" type="tires" unit="percent">513</p1:feature>
    <p1:feature car="299" type="tires" unit="percent">516</p1:feature>
    <p1:feature car="323" type="tires" unit="percent">516</p1:feature>
    <p1:feature car="353" type="tires" unit="percent">518</p1:feature>
    <p1:feature car="391" type="tires" unit="percent">520</p1:feature>
    <p1:feature car="448" type="tires" unit="percent">520</p1:feature>
</p1:car>

File 2:

<p1:car>                           
    <p1:feature car="111" type="color">511</p1:feature>
    <p1:feature car="223" type="color">542</p1:feature>
    <p1:feature car="299" type="color">559</p1:feature>
    <p1:feature car="323" type="color">564</p1:feature>
    <p1:feature car="353" type="color">564</p1:feature>
    <p1:feature car="391" type="color">570</p1:feature>
    <p1:feature car="448" type="color">570</p1:feature>

    <p1:feature car="223" type="tires" unit="percent">513</p1:feature>
    <p1:feature car="299" type="tires" unit="percent">516</p1:feature>
    <p1:feature car="323" type="tires" unit="percent">516</p1:feature>
    <p1:feature car="353" type="tires" unit="percent">518</p1:feature>
    <p1:feature car="391" type="tires" unit="percent">520</p1:feature>
    <p1:feature car="440" type="tires" unit="percent">520</p1:feature>
</p1:car>

As you can look closely that in File 2 there is no line <p1:feature car8="111" type="tires" unit="percent">511</p1:feature> in 2nd paragraph which is present in File 1.

Also in last line of 2nd paragraph of file 2 its car="440"whereas in file 1 it is car="448"

What I want:

In files I am dealing there are numerous such differences so can you guys tell me how to printout such missing lines and unequal numbers from these files.I want output in following form:

In file two feature car="111", type="tires" and text = 511 is missing
In file two car="448" whereas in file one it is car="440"

Also, you can suggest me ideas and different methods. I am stuck in this question from very long time and want to get this solve immediately.

What I tried:

I am using lxml for comparison work and I tried using a for loop in following manner:

for i,j in zip(file1.getchildren(),file2.getchildren()):
        if (int(i.get("car")) & int(i.text)) != (int(j.get("car")) & int(j.text)):
               print "difference of both files"

Due to line to line approach of comparison I am getting all wrong results starting from 2nd paragraph of both files since one line is missing from 2nd file.

Upvotes: 0

Views: 119

Answers (1)

Fabio Menegazzo
Fabio Menegazzo

Reputation: 1249

I think what you want is difflib. Please take a the official documentation here.

In general words, what you want is:

from difflib import Differ
text_1 = file_1.read() # getting XML contents
text_2 = file_2.read() # getting XML contents from second file
d = Differ()
result = d.compare(text_1, text_2)

For more details about usage please refer to the official documentation.

Upvotes: 2

Related Questions