Ijon Tichy
Ijon Tichy

Reputation: 61

How to compare two XSComplexType for equality?

I'm automatically generating XML-Schemata and the resulting xsd files are created according to the venetian blind design pattern. Right now I'm having a lot of complex types and want to reduce their number. Is there an easy way of figuring out if two complex types describe the same restrictions?

Here's an example to show you what I mean:

    <xs:complexType name="someType">
      <xs:choice>
          <xs:element name="BR" type="xs:string"/>
          <xs:element name="A" type="xs:string"/>
      </xs:choice>
    </xs:complexType>

    <xs:complexType name="someOtherType">
      <xs:choice>
          <xs:element name="A" type="xs:string"/>
          <xs:element name="BR" type="xs:string"/>
      </xs:choice>
    </xs:complexType>

Obviously "someType" and "someOtherType" are equivalent. Now let's say I want to find out which Types in two schemata are equivalent. I'm parsing the schemata using XSOM.

    import java.io.File;
import java.io.IOException;
import java.util.Map;

import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.SAXException;

import com.sun.xml.xsom.XSComplexType;
import com.sun.xml.xsom.XSSchema;
import com.sun.xml.xsom.XSSchemaSet;
import com.sun.xml.xsom.parser.XSOMParser;

public class MyXSOM {

    public static void main(String[] args) {
        SAXParserFactory factory = SAXParserFactory.newInstance();
        XSOMParser parser = new XSOMParser(factory);

        try {
            parser.parse(new File("schema.xsd"));
            parser.parse(new File("schema2.xsd"));

            XSSchemaSet sset = parser.getResult();
            XSSchema schema1 = sset.getSchema(0);
            XSSchema schema2 = sset.getSchema(1);

            Map<String, XSComplexType> schema1ComplexTypes = schema1.getComplexTypes();
            Map<String, XSComplexType> schema2ComplexTypes = schema2.getComplexTypes();

            for(XSComplexType currentType1: schema1ComplexTypes.values()){
                for(XSComplexType currentType2: schema2ComplexTypes.values()){
                    // if currentType1 and currentType2 define the same complexType, do s.t. 
                }
            }

        } catch (SAXException | IOException e) {
            e.printStackTrace();
        }
    }
}

Is there an elegant way of checking for this kind of equality between two "complexType" nodes?

Upvotes: 1

Views: 222

Answers (1)

C. M. Sperberg-McQueen
C. M. Sperberg-McQueen

Reputation: 25054

I don't know of any good off-the-shelf type comparison tools. (Which doesn't mean there aren't any.)

To roll your own, it will not (pace Michael Kay) be necessary to solve the problem of equivalence of two context-free grammars: a complex type's content model defines a regular language, not a context-free language. The atomic symbols of this language are the XML element-type names of the possible children (wildcards complicate this somewhat, but not insolubly), and the content model essentially defines a regular expression over that language.

You will need to decide whether by "equality" of complex types you mean that they accept exactly the same set of inputs as valid, or that they produce exactly the same set of type-annotated output trees (PSVIs), or both.

The first is reasonably straightforward: any automata-theory textbook will explain how to construct finite-state automata from regular expressions and how to compare two FSAs for equivalence of the languages they recognize. (But since you ask for an elegant way, I'll observe that very few automata-theory textbooks I have seen talk about Brzozowski derivatives, which offer alternative methods for such tasks which seem more elegant to at least some readers.)

To check further for equivalence of the type and other annotations on the output, you will need to ensure that for each symbol in the language, the two complex types assign the same type, or equivalent types, to the elements bearing that symbol; you will be aided in this task by the so-called Element Declarations Consistent constraint, which ensures that in any legal XSD schema no two sibling elements with the same name may be assigned different types. (Unfortunately, despite its name, it does NOT ensure that no two siblings of the same name will have the same values for other properties of the element declaration, such as nillability or annotations; opinions may differ on whether this is a bug or a feature.)

Upvotes: 1

Related Questions