beyonddc
beyonddc

Reputation: 1256

How to handle evolving XML schema in Java

How to handle evolving XML schema in Java? I have a use case where I have to support a set of old and evolving XML schema in a Java application (i.e. supporting Foo v1, v2, v3, v4, v5).

My use cases include - reading all Foo XML content that were written against different version of Foo XML schema - merging Foo XML content that could be written in different against using different version of OVAL XML schema (i.e. merging Foo v1 with Foo v5).

The Foo XML schema is fairly complicated and there is known backward compatibility issue so it is possible that Foo v1 XML content can fail XML schema validation using Foo v3 XML schema.

I thought of 2 approaches 1) Use Java XML data binding such as JAXB and generate a set of bindings for each version of the XML schema. Using Foo XML schema as example, I will be generating 5 sets of bindings for Foo XML schema v1 to v5. The challenge is how to merge from version of Foo XML content with another version of XML content.

2) Create one set of Java data model and parse it manually using SAX, DOM, JDOM and attempt to resolve all backward compatibility issue I might have. The challenge is now I have to parse the XML myself without the help of JAXB.

I would like to get some advice on what is the best approach to handle evolving XML schema. Is Java XML data binding the right path forward or creating my own Java data model and parse it manually?

Upvotes: 3

Views: 874

Answers (3)

Michael Kay
Michael Kay

Reputation: 163587

Schema evolution is the big drawback of the data binding approach. If your schema is not stable, then data binding is going to be a hassle, as you have discovered. There's a basic conflict here: XML is designed to be flexible ("semi-structured") in the data structures it handles, and Java is not. Are you sure that data binding is the right approach for you? Might it not be better to use a programming language designed for XML, such as XSLT or XQuery?

Upvotes: 1

Marcel Stör
Marcel Stör

Reputation: 23565

We have Java converters for each new version. They are able to convert from the respective previous version. We get v1 as XML, transform it to Java using JAXB, then convert to the data model v2, v3, v4, v5. The converters are all under version control an part of every released artifact.

Also, we support branches like v2-1, v2-2. This requires that we have converters from branch n to the next major n+1 (e.g. v2-2 -> v3). At certain intervals we stop support for "very old" branches.

Upvotes: 1

stevevls
stevevls

Reputation: 10853

In my experience, the most important thing is the data model and not the input formats. If you can provide a clean model and abstract away all the nastiness of the different inputs, you'll end up with a much cleaner and more manageable codeline.

Given that versions of a single document tend to be incremental, you can probably get a fair amount of code reuse if you write the parsers yourself, or you can create parallel JAXB packages for dealing with each format paired with another class to convert that version specific model to your top-level model.

Upvotes: 3

Related Questions