KlaymenDK
KlaymenDK

Reputation: 724

Custom XML file comparison

I've seen there are a lot of posts about XML comparison, but none of the one's I've looked at solve my problem.

We have some XML-formatted text documents (product descritptions, with headings and paragraphs) that are being updated (i.e. versioned), and I've been tasked with making change digests. That is, we want to take two sequent files and generate a third; the heading structure (outline) is to be preserved, but only paragraphs with changes are to be kept -- additions as well as deletions should be marked up.

So I've been trying to find a way to walk both DOM trees and detecting additions and deletions, but I'm running into problems detecting them reliably. It's obviously because I should be doing a diff -- but I can't use a plain diff because I want to do individual diffs inside each element, and because I can't use a traditional diff result but need to have a fully formatted xml digest.

Any hints before I try to tackle the "Longest common subsequence problem", which is going to be a huge task?

Upvotes: 3

Views: 1154

Answers (3)

KlaymenDK
KlaymenDK

Reputation: 724

Turns out, my need had no solution at the time! Meanwhile, I've developed my own xml-diff routine that is specific to my problem, so I ended up with a working solution.

Then, in late 2011, this was published: Slashdot: Researchers Expanding Diff, Grep Unix Tools

Dartmouth computer scientists presented variants of the grep and diff Unix command line utilities that can handle more complex types of data. The new programs, called Context-Free Grep and Hierarchical Diff, will provide the ability to parse blocks of data rather than single lines. The research has been funded in part by Google and the U.S. Energy Department.

Upvotes: 0

Michael Kay
Michael Kay

Reputation: 163342

A professional solution to this problem - but it's not free - is the DeltaXML product. Buying it will probably be cheaper than building your own.

Upvotes: 2

oiavorskyi
oiavorskyi

Reputation: 2941

I would suggest using XMLUnit as an engine for differencing. It provides ability to use you own DifferenceListener which is notified whenever two nodes are different. In the handler you'd be able to process addition of appropriate DOM nodes to your target document.

Upvotes: 4

Related Questions