David Thielen
David Thielen

Reputation: 32966

Does JSON to XML lose me anything?

We have a program that accepts as data XML, JSON, SQL, OData, etc. For the XML we use Saxon and its XPath support and that works fantastic.

For JSON we use the jsonPath library which is not as powerful as XPath 3.1. And jsonPath is a little squirrelly in some corner cases.

So... what if we convert the JSON we get to XML and then use Saxon? Are there limitations to that approach? Are there JSON constructs that won't convert to XML, like anonymous arrays?

Upvotes: 0

Views: 435

Answers (2)

Michael Kay
Michael Kay

Reputation: 163458

The headline question: The json-to-xml() function in XPath 3.1 is lossless, except that by default, characters that are invalid in XML (such as NUL, or unpaired surrogates) are replaced by a SUB character -- you can change this behaviour with the option escape=true.

The losslessness has been achieved at some cost in convenience. For example, JSON property names are not translated to XML element or attribute names, but rather to values of the key attribute.

Upvotes: 3

Martin Honnen
Martin Honnen

Reputation: 167696

Lots of different people have come up with lots of different conversions of JSON to XML. As already pointed out, the XPath 3.1 and the XSLT 3.0 spec have a loss-less, round-tripping conversion with json-to-xml and xml-to-json that can handle any JSON.

There are simpler conversions that handle limited sets of JSON, the main problem is how to represent property names of JSON that don't map to XML names e.g. { "prop 1" : "value" } is represented by json-to-xml as <string key="prop 1">value</string> while conversions trying to map the property name to an element or attribute name either fail to create well-formed XML (e.g. <prop 1>value</prop 1>) or have to escape the space in the element name (e.g. <prop_1>value</prop_1> or some hex representation of the Unicode of the space inserted).

In the end I guess you want to select the property foo in { "foo" : "value" } as foo which the simple conversion would give you; in XPath 3.1 you would need ?foo for the XDM map or fn:string[@key = 'foo'] for the json-to-xml result format.

With { "prop 1" : "value" } the latter kind of remains as fn:string[@key = 'prop 1'], the ? approach needs to be changed to ?('prop 1') or .('prop 1'). Any conversion that has escaped the space in an element name requires you to change the path to e.g. prop_1.

There is no ideal way for all kind of JSON I think, in the end it depends on the JSON formats you expect and the willingness or time of users to learn a new selection/querying approach.

Of course you can use other JSON to XML conversions than the json-to-xml and then use XPath 3.1 on any XML format; I think that is what the oXygen guys opted for, they had some JSON to XML conversion before XPath 3.1 provided one and are mainly sticking with it, so in oXygen you can write "path" expressions against JSON as under the hood the path is evaluated against an XML conversion of the JSON. I am not sure which effort it takes to indicate which JSON values in the original JSON have been selected by XPath path expressions in the XML format, that is probably not that easy and straightforward.

Upvotes: 1

Related Questions