Ogen
Ogen

Reputation: 6709

Semantic equivalence of XML documents

Let's say I have these two simple XML files:

Example 1

<parent name="Bob" gender="male">
    <child name="Steve" gender="male"></child>
    <child name="Stephanie" gender="female"></child>
</parent>

Example 2

<parent name="Bob" gender="male">
    <child name="Stephanie" gender="female"></child>

    <child name="Steve" gender="male"></child>

</parent>

I am trying to create a function that will input two strings that represent XML and return true if and only if they are semantically equivalent. In this case, even though there are whitespace differences and the order of the children are different, the XML files are still semantically identical.

I currently have a seemingly working solution but I'm afraid it may have drawbacks or I may have overthought the issue. My current solution involves three steps.

  1. Remove all whitespace from the strings
  2. Sort the strings alphanumerically
  3. Perform a standard string equality check

This solution seems to work but I'm wondering if there are any problems with it or if I should be tackling this issue another way.

Upvotes: 1

Views: 468

Answers (3)

Dave Black
Dave Black

Reputation: 8049

I have a project I wrote based on work from @eric-white Eric White formerly from Microsoft. I just have to dust it off, probably make some updates, and post it to GitHub here: https://github.com/udlose/xml-equivalency

I will update this post once the code has been posted.

Upvotes: 0

Michael Kay
Michael Kay

Reputation: 163645

What's significant in the XML is something that only you can decide. Steve and Stephen might or might not be the same name. Better not to use the word "semantics": just define what your equivalence rules are. Your general approach to testing equivalence is to define a normal form, transform data into the normal form, and then do a simple lexical test of the normalized values - and that's a perfectly reasonable way of going about it. But only you can decide what the appropriate normalization function is.

Upvotes: 1

kjhughes
kjhughes

Reputation: 111756

XML has no intrinsic semantics. Semantics generally refers to meaning, and as a data format, XML itself is not concerned with meaning.

What you really appear to seek is a form of equivalence between two XML documents. To be able to apply the sort of "standard string equality check" you mention, consider putting the XML into a standard lexical form such as is defined by one of the following recommendations:

Finally, you might consider two documents to be equivalent at a grammatical rather than a lexical level by defining equivalence to be true if the documents are both valid according to the same XML schema.

Upvotes: 2

Related Questions