Reputation: 153
I have read many post where people have asked about enforcing some order of attributes to an XML element and the general response is that it's not legal/required/allowed/relevant/other.
I am not looking for any response saying I shouldn't care about attribute order, so please don't reply if that's your view.
I have a real problem which needs a solution. A large corporate product treats the following two elements as different in the latest version of their product
<objquestion allowmultiple="true" id="7432" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice">
<objquestion id="7432" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice" allowmultiple="true">
Particularly, if the allowmultiple
attribute is after the questiontype
it acts as a modifier to the question type. If it's before, it's ignored - it shouldn't be.
So, they are unlikely to fix their product in the short term.
I am manipulating this XML content using
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
doc = dbf.newDocumentBuilder().parse(new InputSource(path));
and the internal implementation will sort the attributes in the DOM node map. When it's written back to the file it writes the attributes in their now sorted order. I have a lot of code that is playing around with the Document object using XPath.
When I have finished manipulating the Document I currently write it back with
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(new DOMSource(lc.getDocument()), new StreamResult(new File(paths[1])));
DESIRED RESULT: What I need to be able to do is to ensure the allowmultiple
attribute is written AFTER the questiontype
.
GOOD: So always like this...
<objquestion ... questiontype="multchoice" ... allowmultiple="true" ... >
BAD: ...and never like this:
<objquestion ... allowmultiple="true" ... questiontype="multchoice" ... >
I have tried to understand if I can either affect the serialisation used to write the DOM tree back or if I can simply substitute a different implementation that does not parse the attributes into a sorted map initially. I guess both would work, but I've not been able to find out how to do this.
I looked at LSSerializer
, but I am not sure how I can intercept that particular <objquestion>
element. Would I have to extend a FileOutputStream
and look for something?
I have read that SAX might not do the initial sorting, but I need to be able to drop in the parser without much new code and am not so strong with the whole XML world.
Can anyone suggest a way to do this?
Upvotes: 3
Views: 5657
Reputation: 3108
Building on abinet's answer of renaming attributes.
This is still terrible hackery but at least does NOT use sed. (It'll be useful to me. Running fully blown Saxon and/or XSL is too much for me.)
First rename attribute via xmlstarlet, then canonicalize, then rename back to original.
$ cat input.xml
<?xml version="1.0"?>
<xml>
<objquestion allowmultiple="true" id="7432" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice"></objquestion>
<objquestion id="7433" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice" allowmultiple="true"></objquestion>
</xml>
Rename attribute to be where you want it in alphabetical sequence:
$ cat input.xml \
| xmlstarlet edit --rename '//@allowmultiple' -v 'zzz_allowmultiple'
<?xml version="1.0"?>
<xml>
<objquestion zzz_allowmultiple="true" id="7432" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice"/>
<objquestion id="7433" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice" zzz_allowmultiple="true"/>
</xml>
Canonicalization will sort elements alphabetically:
$ cat input.xml \
| xmlstarlet edit --rename '//@allowmultiple' -v 'zzz_allowmultiple' \
| xmlstarlet canonic
<xml>
<objquestion id="7432" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice" zzz_allowmultiple="true"></objquestion>
<objquestion id="7433" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice" zzz_allowmultiple="true"></objquestion>
</xml>
Rename element back to original name:
$ cat input.xml \
| xmlstarlet edit --rename '//@allowmultiple' -v 'zzz_allowmultiple' \
| xmlstarlet canonic \
| xmlstarlet edit --rename '//@zzz_allowmultiple' -v 'allowmultiple'
<?xml version="1.0"?>
<xml>
<objquestion id="7432" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice" allowmultiple="true"/>
<objquestion id="7433" idtext="7433" idvar="7429" parent="7430" questiontype="multchoice" allowmultiple="true"/>
</xml>
Upvotes: 0
Reputation: 2808
It sounds like a hack, but you can rename this attribute with something like x1allowmultiple
and then it'll be a last one:
allowmultiple
with x1allowmultiple
x1allowmultiple
x1allowmultiple
with allowmultiple
Upvotes: 1
Reputation: 163655
The next Saxon release (9.5, due imminently) has a serialization attribute (saxon:attribute-order
) that allows you to control attribute order. It was added for legitimate use cases (it can improve human readability to have id
attributes always come first, for example), and I slightly regret that it's going to end up being used for use cases like yours that result from the incompetence and irresponsibility of the programmers employed by large corporations, but so be it: if it solves a problem, I won't cry.
Upvotes: 5