lexicore
lexicore

Reputation: 43661

Semantics of the xsd:dateTime without timezone and its conversion to Date

I have a question concerning XML Schema's built-in type xsd:dateTime.

What are the exact semantics of xsd:dateTime without a timezone? Ex. 1970-01-01T00:00:00.

I've read through a number of XML Schema spec documents but could not find out how should it be processed.

Specifically, I want to understand how to convert xsd:dateTime to the Date (like java.util.Date or JavaScript Date) object correctly.

Side note: I am perfectly aware of Java util classes like DatatypeConverter or DatatypeFactory, I would like to find the XML Schema spec that defines how to do this conversion.

The problem with the Date class (in Java as well in JavaScript) is that these classes do have timezones (defaulted to the local time zone). If I'm getting a xsd:dateTime without time zone on input then I have to deside somehow, which time zone I should assume. Otherwise I just can't convert it to a timezoned value (like Date).

Now the question is, what should I assume. I see following options here:

I don't really like the second option. It is entirely random! On my machine, if I run

System.out.println(DATATYPE_FACTORY
    .newXMLGregorianCalendar("1970-01-01T00:00:00")
    .toGregorianCalendar().getTime().getTime());

I'll get -3600000, 0, 3600000 for GMT+1, GMT or GMT-1 (and even more variants depending on summer time. This is so arbitrary, I'm really not getting this. Does this mean than when we have an XML document with an element like

<date-time>1970-01-01T00:00:00</date-time>

we have actually no idea, which exactly time instant was meant?

The first option (assuming UTC) seems more valid to me but this is apparently not what (at least) Java tools are doing.

So could please someone give me a pointer to a spec of some kind defining semantics of the timezoneless xsd:dateTime?

Thank you.

Update:

Current findings are:

My solution will be as follows:

Upvotes: 6

Views: 6757

Answers (2)

Michael Kay
Michael Kay

Reputation: 163322

Basically the timezone is absent information, and there are many ways of interpreting absent information; in the end it's up to you. Possible interpretations are:

  • the timezone is unknown

  • the timezone can be established from the context, e.g. an associated place

  • the timezone is UTC

The XPath/XQuery/XSLT family of specifications assume a context-defined timezone. The context here could be the locale of the user, or the timezone of the machine on which the software is running, or any number of other things.

In a sense it's no different from omitting the time and giving only a date. What exactly do you mean when you say you were born on 21 March 1973? What timezone are you talking about? The assumption is probably that you've left out the information because no-one is likely to care.

Upvotes: 5

Petru Gardea
Petru Gardea

Reputation: 21638

This is what I've used myself. It all starts from the dateTime spec:

"Local" or untimezoned times are presumed to be the time in the timezone of some unspecified locality as prescribed by the appropriate legal authority; currently there are no legally prescribed timezones which are durations whose magnitude is greater than 14 hours. The value of each numeric-valued property (other than timeOnTimeline) is limited to the maximum value within the interval determined by the next-higher property. For example, the day value can never be 32, and cannot even be 29 for month 02 and year 2002 (February 2002).

If that is confusing, then go to section 3.2.7.2 Order relation on dateTime

Excerpts (to meet posting criteria here):

The ordering between two dateTimes P and Q is defined by the following algorithm [...] A.Normalize P and Q. That is, if there is a timezone present, but it is not Z, convert it to Z [...]

These would be relevant:

C.Otherwise, if P contains a time zone and Q does not, compare as follows: 1.P < Q if P < (Q with time zone +14:00) 2.P > Q if P > (Q with time zone -14:00) 3.P <> Q otherwise, that is, if (Q with time zone +14:00) < P < (Q with time zone -14:00)

D. Otherwise, if P does not contain a time zone and Q does, compare as follows: 1. P < Q if (P with time zone -14:00) < Q. 2. P > Q if (P with time zone +14:00) > Q. 3. P <> Q otherwise, that is, if (P with time zone +14:00) < Q < (P with time zone -14:00)

The "magic number" 14, from 3.2.7:

[...]currently there are no legally prescribed timezones which are durations whose magnitude is greater than 14 hours.

Of course, you could run in indeterminate scenarios, that is where order cannot be ascertained:

2000-01-01T12:00:00 <> 1999-12-31T23:00:00Z

2000-01-16T12:00:00 <> 2000-01-16T12:00:00Z

2000-01-16T00:00:00 <> 2000-01-16T12:00:00Z

It is really hard to tell what kind of assumption you should make. You need to chase down and understand how that value was captured and then passed on to you in XML, since both assumptions can be wrong! If this data is passed around, eventually sent it back to the systems in the same realm as the one that sent it, a safe practice is to make sure you always have a "string" copy of that data.

I really don't think that the stuff you're getting is random. You just need to read a bit more on these specs. And I am not saying it is easy - it is the way it is; plus, this is not about XML or XSD, it is about timezones in general.

Upvotes: 2

Related Questions