Reputation: 141

Meaningful XML comparison

I am trying to achieve meaningful XML comparison. I want to compare two different XML to know if they are 'meaningful' equal.

Example XML 1:

    <?xml version="1.0" encoding="UTF-8"?>
    <al:moAttribute>
         <al:name>impiId</al:name>
         <al:value>616731935012345678</al:value>
    </al:moAttribute>

    <al:moAttribute>
          <al:name>impuId</al:name>
          <al:value>tel:+16167319350</al:value>
    </al:moAttribute>

XML 2 :

    <?xml version="1.0" encoding="UTF-8"?>
    <al:moAttribute>
          <al:name>impuId</al:name>
          <al:value>tel:+16167319350</al:value>
    </al:moAttribute>
    <al:moAttribute>
         <al:name>impiId</al:name>
         <al:value>616731935012345678</al:value>
    </al:moAttribute>

In this example both the XMLs are 'meaningful' equal but only differs in the sequence of elements. I want to compare both of them to know if they are almost equal.

I tried this solution :

Best way to compare 2 XML documents in Java

I tried :

XMLUnit.setIgnoreWhitespace(true);
diff.identical (...);
diff.similar (...);

But if the XML's differs in sequence, XML comparison returns false.

Any suggestions please ?

Upvotes: 0

Answers (5)

Stephen Flynn

Reputation: 125

I have solved this issue using XSLT which uses an unordered tree comparison in my github. Basically it would output the matches and mismatches of any two xml files with regarding to it's position relative to the root of the tree. For example:

<a>
 <c/>
 <e/>
</a>

And:

<a>
 <e/>
 <c/>
</a>

Would be treated as equal. You just have to modify the file variable at the top of the sheet to choose which XML file to compare against. https://github.com/sflynn1812/xslt-diff-turbo

From an efficiency perspective the speed of any tree comparison algorithm is determined by the number of differences in the two trees.

Currently to apply it to your example I would suggest stripping out the xml namespaces first, because that is not currently supported.

Upvotes: 0

Sravan Yadav Lingam

Reputation: 33

Guys This is working absolutely perfect for me . It is showing the difference wherever the changes are.

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.net.URL;
import java.util.List;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.custommonkey.xmlunit.DetailedDiff;
import org.custommonkey.xmlunit.Diff;
import org.custommonkey.xmlunit.Difference;
import org.custommonkey.xmlunit.XMLUnit;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;

public class Xmlreader {
     public static void main(String[] args) throws SAXException, IOException, ParserConfigurationException {
         XMLUnit.setIgnoreWhitespace(true);
         XMLUnit.setIgnoreComments(true);
         XMLUnit.setIgnoreAttributeOrder(true);
         DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
         dbf.setNamespaceAware(true);
         dbf.setCoalescing(true);
         dbf.setIgnoringElementContentWhitespace(true);
         dbf.setIgnoringComments(true);
         DocumentBuilder db = dbf.newDocumentBuilder();

     Document doc1 = db.parse(new File("C:/Users/sravanlx/Desktop/base.xml"));
     doc1.normalizeDocument();

     Document doc2 = db.parse(new File("C:/Users/sravanlx/Desktop/base2.xml"));
       /* URL url1 = Xmlreader.class.getResource("C:/Users/sravanlx/Desktop/base.xml");
        URL url2 = Xmlreader.class.getResource("C:/Users/sravanlx/Desktop/base2.xml");
        FileReader fr1 = null;
        FileReader fr2 = null;
        try {
            fr1 = new FileReader("C:/Users/username/Desktop/base.xml");
            fr2 = new FileReader("C:/Users/username/Desktop/base2.xml");
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }*/

        Diff diff = new Diff(doc1, doc2);
        System.out.println("Similar? " + diff.similar());
        System.out.println("Identical? " + diff.identical());

        DetailedDiff detDiff = new DetailedDiff(diff);
        List differences = detDiff.getAllDifferences();
        for (Object object : differences) {
            Difference difference = (Difference)object;
            System.out.println("***********************");
            System.out.println(difference);
            System.out.println("***********************");
        }
    } }

Upvotes: 0

Michael Kay

Reputation: 163342

Any tools at the XML level will assume that the order of elements is significant. If you know that in your particular vocabulary, the order of elements is not significant, then you need a tool that works with an understanding of your vocabulary. Your best bet is therefore to write a normalizing transformation (typically in XSLT) that removes irrelevant differences from the documents (for example, by sorting elements on some suitable key) so that they then compare equal when compared using standard XML tools (perhaps after XML canonicalisation).

Upvotes: 1

Jayan

Reputation: 18459

You may find xmlunit's RecursiveElementNameAndTextQualifier useful here. Here is a snippet

XMLUnit.setIgnoreWhitespace(true);
XMLUnit.setIgnoreComments(true);
XMLUnit.setIgnoreAttributeOrder(true);

Document docx1 = XMLUnit.buildDocument(..);
Document docx2 = XMLUnit.buildDocument(..);

Diff diff = new Diff(docx1, docx2);
DifferenceEngine engine = new DifferenceEngine(diff);

ElementQualifier qualifier = new RecursiveElementNameAndTextQualifier();
diff = new Diff(docx1, docx2, engine, qualifier);
diff.overrideDifferenceListener(new DifferenceListener()
{
    @Override public int differenceFound(Difference difference)
    {
         //do something with difference
         // return processDiff(difference);

    }

    @Override public void skippedComparison(Node node, Node node1)
    {
        //no op
    }
});

//check diff.identical() || diff.similar();

Upvotes: 0

Jegg

Reputation: 549

You can do it using jaxb to achive your goal (exmple http://www.mkyong.com/java/jaxb-hello-world-example/)

1 construct two java objects using jaxb from given two xml files

2 in each java object，you have a list of al:values for each xml file (you only care about this)

3 compare those two list please refer to Simple way to find if two different lists contain exactly the same elements?

by doing this, you will overcome the order problem

Upvotes: 0

Meaningful XML comparison

Answers (5)

Related Questions