adb
adb

Reputation: 153

Manipulate order of XML element attributes

I have read many post where people have asked about enforcing some order of attributes to an XML element and the general response is that it's not legal/required/allowed/relevant/other.

I am not looking for any response saying I shouldn't care about attribute order, so please don't reply if that's your view.

I have a real problem which needs a solution. A large corporate product treats the following two elements as different in the latest version of their product

<objquestion allowmultiple="true" id="7432" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice">
<objquestion id="7432" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice" allowmultiple="true">

Particularly, if the allowmultiple attribute is after the questiontype it acts as a modifier to the question type. If it's before, it's ignored - it shouldn't be.

So, they are unlikely to fix their product in the short term.

I am manipulating this XML content using

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
doc = dbf.newDocumentBuilder().parse(new InputSource(path));

and the internal implementation will sort the attributes in the DOM node map. When it's written back to the file it writes the attributes in their now sorted order. I have a lot of code that is playing around with the Document object using XPath.

When I have finished manipulating the Document I currently write it back with

Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(new DOMSource(lc.getDocument()), new StreamResult(new File(paths[1])));

DESIRED RESULT: What I need to be able to do is to ensure the allowmultiple attribute is written AFTER the questiontype.

GOOD: So always like this...

<objquestion ... questiontype="multchoice" ... allowmultiple="true" ... >

BAD: ...and never like this:

<objquestion ... allowmultiple="true" ... questiontype="multchoice" ... >

I have tried to understand if I can either affect the serialisation used to write the DOM tree back or if I can simply substitute a different implementation that does not parse the attributes into a sorted map initially. I guess both would work, but I've not been able to find out how to do this.

I looked at LSSerializer, but I am not sure how I can intercept that particular <objquestion> element. Would I have to extend a FileOutputStream and look for something?

I have read that SAX might not do the initial sorting, but I need to be able to drop in the parser without much new code and am not so strong with the whole XML world.

Can anyone suggest a way to do this?

Upvotes: 3

Views: 5657

Answers (4)

StackzOfZtuff
StackzOfZtuff

Reputation: 3108

Building on abinet's answer of renaming attributes.

This is still terrible hackery but at least does NOT use sed. (It'll be useful to me. Running fully blown Saxon and/or XSL is too much for me.)

First rename attribute via xmlstarlet, then canonicalize, then rename back to original.

1. Original

$ cat input.xml
<?xml version="1.0"?>
<xml>
  <objquestion allowmultiple="true" id="7432" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice"></objquestion>
  <objquestion id="7433" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice" allowmultiple="true"></objquestion>
</xml>

2. Rename

Rename attribute to be where you want it in alphabetical sequence:

$ cat input.xml \
    | xmlstarlet edit --rename '//@allowmultiple' -v 'zzz_allowmultiple'
<?xml version="1.0"?>
<xml>
  <objquestion zzz_allowmultiple="true" id="7432" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice"/>
  <objquestion id="7433" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice" zzz_allowmultiple="true"/>
</xml>

3. Sort

Canonicalization will sort elements alphabetically:

$ cat input.xml \
    | xmlstarlet edit --rename '//@allowmultiple' -v 'zzz_allowmultiple' \
    | xmlstarlet canonic
<xml>
  <objquestion id="7432" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice" zzz_allowmultiple="true"></objquestion>
  <objquestion id="7433" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice" zzz_allowmultiple="true"></objquestion>
</xml>

4. Back

Rename element back to original name:

$ cat input.xml \
    | xmlstarlet edit --rename '//@allowmultiple' -v 'zzz_allowmultiple' \
    | xmlstarlet canonic \
    | xmlstarlet edit --rename '//@zzz_allowmultiple' -v 'allowmultiple'
<?xml version="1.0"?>
<xml>
  <objquestion id="7432" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice" allowmultiple="true"/>
  <objquestion id="7433" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice" allowmultiple="true"/>
</xml>

Upvotes: 0

abinet
abinet

Reputation: 2808

It sounds like a hack, but you can rename this attribute with something like x1allowmultiple and then it'll be a last one:

  • search and replace all occurrences of allowmultiple with x1allowmultiple
  • do processing and create output file with x1allowmultiple
  • search and replace all occurences of x1allowmultiple with allowmultiple

Upvotes: 1

Michael Kay
Michael Kay

Reputation: 163655

The next Saxon release (9.5, due imminently) has a serialization attribute (saxon:attribute-order) that allows you to control attribute order. It was added for legitimate use cases (it can improve human readability to have id attributes always come first, for example), and I slightly regret that it's going to end up being used for use cases like yours that result from the incompetence and irresponsibility of the programmers employed by large corporations, but so be it: if it solves a problem, I won't cry.

Upvotes: 5

MMendes
MMendes

Reputation: 84

You can use this JAXB aproach and see this example for element attribute ordering.

Upvotes: 0

Related Questions