Reputation: 821
I am trying to compare below given two xml formats in python and would like to have your inputs on my approach
File 1:
<p1:car>
<p1:feature car="111" type="color">511</p1:feature>
<p1:feature car="223" type="color">542</p1:feature>
<p1:feature car="299" type="color">559</p1:feature>
<p1:feature car="323" type="color">564</p1:feature>
<p1:feature car="353" type="color">564</p1:feature>
<p1:feature car="391" type="color">570</p1:feature>
<p1:feature car="448" type="color">570</p1:feature>
<p1:feature car="111" type="tires" unit="percent">511</p1:feature>
<p1:feature car="223" type="tires" unit="percent">513</p1:feature>
<p1:feature car="299" type="tires" unit="percent">516</p1:feature>
<p1:feature car="323" type="tires" unit="percent">516</p1:feature>
<p1:feature car="353" type="tires" unit="percent">518</p1:feature>
<p1:feature car="391" type="tires" unit="percent">520</p1:feature>
<p1:feature car="448" type="tires" unit="percent">520</p1:feature>
</p1:car>
File 2:
<p1:car>
<p1:feature car="111" type="color">511</p1:feature>
<p1:feature car="223" type="color">542</p1:feature>
<p1:feature car="299" type="color">559</p1:feature>
<p1:feature car="323" type="color">564</p1:feature>
<p1:feature car="353" type="color">564</p1:feature>
<p1:feature car="391" type="color">570</p1:feature>
<p1:feature car="448" type="color">570</p1:feature>
<p1:feature car="223" type="tires" unit="percent">513</p1:feature>
<p1:feature car="299" type="tires" unit="percent">516</p1:feature>
<p1:feature car="323" type="tires" unit="percent">516</p1:feature>
<p1:feature car="353" type="tires" unit="percent">518</p1:feature>
<p1:feature car="391" type="tires" unit="percent">520</p1:feature>
<p1:feature car="440" type="tires" unit="percent">520</p1:feature>
</p1:car>
As you can look closely that in File 2 there is no line <p1:feature car8="111" type="tires" unit="percent">511</p1:feature>
in 2nd paragraph which is present in File 1.
Also in last line of 2nd paragraph of file 2 its car="440"
whereas in file 1 it is car="448"
What I want:
In files I am dealing there are numerous such differences so can you guys tell me how to printout such missing lines and unequal numbers from these files.I want output in following form:
In file two feature car="111", type="tires" and text = 511 is missing
In file two car="448" whereas in file one it is car="440"
Also, you can suggest me ideas and different methods. I am stuck in this question from very long time and want to get this solve immediately.
What I tried:
I am using lxml for comparison work and I tried using a for loop in following manner:
for i,j in zip(file1.getchildren(),file2.getchildren()):
if (int(i.get("car")) & int(i.text)) != (int(j.get("car")) & int(j.text)):
print "difference of both files"
Due to line to line approach of comparison I am getting all wrong results starting from 2nd paragraph of both files since one line is missing from 2nd file.
Upvotes: 0
Views: 119
Reputation: 1249
I think what you want is difflib
. Please take a the official documentation here.
In general words, what you want is:
from difflib import Differ
text_1 = file_1.read() # getting XML contents
text_2 = file_2.read() # getting XML contents from second file
d = Differ()
result = d.compare(text_1, text_2)
For more details about usage please refer to the official documentation.
Upvotes: 2