Schema design: Loose and repeating vs rigid?

Question

Apologies if this question is too open ended, and would be better asked somewhere else.

I'm currently in discussions about schema design for XML schemas representing sets of questionnaires. The questionnaires are reasonably similar, and contain questions which have a few attributes:

Question unique ID
Question title
Question answer type (text, numeric, date etc.)
Answer value

We capture answers to the questionnaires through a website, and now need to send them to other parties via XML messages. I am wondering about the pros and cons of sending XML documents representing filled out questionnaires in:

A 'repeating and flexible' format like this:


    
        
            1234
            Interviewee
            Joe Bloggs
        
        
            1235
            Date of birth
            1980-03-15
        
        ...

A more rigid format:


    Joe Bloggs
    1980-03-15
    ...

There are few enough questionnaires that creating a separate XSD each time a new one was produced probably wouldn't cause too large a headache.

My thoughts are that the second would be much easier to validate, and as the questions have types these would map nicely to schema data types. However the first would naturally be easier to produce from our underlying representation of the questions when we capture responses via the website.

Are there any general best practices for these kind of design decisions, or an obvious reason why one approach would cause fewer headaches than the other, or does this really just depend on the particular set of data being represented and how it will be processed?

Petru Gardea · Accepted Answer

I believe that your question is just the same as abstract vs. concrete, or key/value pair vs. structure, etc. I also think that it is not that specific to XML, as it would equally present itself to an OO programmer; or a database designer; asking about a best practice is the same as asking about the best religion...

Let’s look at your options:

Option 1 is a metamodel: it describes what a question is. Option 2 is a model: it describes what you’re asking.

Option 1 is the sys schema in every database. Option 2 is the user defined schema in a particular database.

Option 1 is "technical": is about IT folks building great infrastructure for questionnaires. Option 2 is all about the “business” the questionnaire is about.

One issue is that 1 is more verbose than 2 (runtime overhead); 2 requires more upfront analysis about what are we asking (design-time overhead). 1 is better positioned to accommodate new questions or extending what a question is about, e.g. a difficulty level or how long it took to get an answer (extensibility). The XSD for 2 is self-contained and self-describing of how to properly consume an instance XML (usability of the spec).

In my experience, ability to implement cheap, and manage change cheap, is very important (cost of ownership); unfortunately, these may also be perceived differently by providers and consumers of services. Finding that sweet spot, a practicable design - your question basically – will need to take more into account, along the lines I’ve tried to describe above.

Strictly speaking to some of the issues you mentioned about Option 1: strong typing, as an example, could be alleviated by using simple choices or a substitution group. However, this kind of stuff will not substantially change any of what I said.

Considering the above then, I’ll annotate your question, to put things right where they belong.

Are there any general best practices for these kind of design decisions see my reference in choosing a “religion”, or an obvious reason why one approach would cause fewer headaches than the other it all depends on many more things which you didn’t mention, or does this really just depend on the particular set of data being represented and how it will be processed yes, to a high degree?

Schema design: Loose and repeating vs rigid?

Answers (1)

Related Questions