Reputation: 11316
When processing XML by means of standard DOM, attribute order is not guaranteed after you serialize back. At last that is what I just realized when using standard java XML Transform API to serialize the output.
However I do need to keep an order. I would like to know if there is any posibility on Java to keep the original order of attributes of an XML file processed by means of DOM API, or any way to force the order (maybe by using an alternative serialization API that lets you set this kind of property). In my case processing reduces to alter the value of some attributes (not all) of a sequence of the same elements with a bunch of attributes, and maybe insert a few more elements.
Is there any "easy" way or do I have to define my own XSLT transformation stylesheet to specify the output and altering the whole input XML file?
Update I must thank all your answers. The answer seems now more obvious than I expected. I never paid any attention to attribute order, since I had never needed it before.
The main reason to require an attribute order is that the resulting XML file just looks different. The target is a configuration file that holds hundreds of alarms (every alarm is defined by a set of attributes). This file usually has little modifications over time, but it is convenient to keep it ordered, since when we need to modify something it is edited by hand. Now and then some projects need light modifications of this file, such as setting one of the attributes to a customer specific code.
I just developed a little application to merge original file (common to all projects) with specific parts of each project (modify the value of some attributes), so project-specific file gets the updates of the base one (new alarm definitions or some attribute values bugfixes). My main motivation to require ordered attributes is to be able to check the output of the application againts the original file by means of a text comparation tool (such as Winmerge). If the format (mainly attribute order) remains the same, the differences can be easily spotted.
I really thought this was possible, since XML handling programs, such as XML Spy, lets you edit XML files and apply some ordering (grid mode). Maybe my only choice is to use one of these programs to manually modify the output file.
Upvotes: 48
Views: 53588
Reputation: 1
Inspired by the answer of Andrey Lebedenko.
Capable of sorting by a Nodes attribute or by a Nodes text content.
Ready to be used in Your XML utility class.
public static Collection<Node> nodeListCollection(final NodeList nodeList) {
if (nodeList == null) {
return Collections.emptyList();
}
final int length = nodeList.getLength();
if (length == 0) {
return Collections.emptyList();
}
return IntStream.range(0, length)
.mapToObj(nodeList::item)
.collect(Collectors.toList());
}
private static int compareString(final String str1, final String str2, final boolean nullIsLess) {
if (Objects.equals(str1, str2)) {
return 0;
}
if (str1 == null) {
return nullIsLess ? -1 : 1;
}
if (str2 == null) {
return nullIsLess ? 1 : -1;
}
return str1.compareTo(str2);
}
private static final Function<Boolean, Comparator<Node>> StringNodeValueComparatorSupplier = (asc) ->
(Node a, Node b) -> {
final String va = a == null ? null : a.getTextContent();
final String vb = b == null ? null : b.getTextContent();
return (asc ? 1 : -1) * compareString(va, vb,asc);
};
private static final BiFunction<Boolean, String, Comparator<Node>> StringNodeAttributeComparatorSupplier = (asc, attrName) ->
(Node a, Node b) -> {
final String va = a == null ? null : a.hasAttributes() ?
((Element) a).getAttribute(attrName) : null;
final String vb = b == null ? null : b.hasAttributes() ?
((Element) b).getAttribute(attrName) : null;
return (asc ? 1 : -1) * compareString(va, vb,asc);
};
private static <T extends Comparable<T>> Comparator<Node> nodeComparator(
final boolean asc,
final boolean useAttr,
final String attribute,
final Constructor<T> constructor
) {
return (Node a, Node b) -> {
if (a == null && b == null) {
return 0;
} else if (a == null) {
return (asc ? -1 : 1);
} else if (b == null) {
return (asc ? 1 : -1);
}
T aV;
try {
final String aStr;
if (useAttr) {
aStr = a.hasAttributes() ? ((Element) a).getAttribute(attribute) : null;
} else {
aStr = a.getTextContent();
}
aV = aStr == null || aStr.matches("\\s+") ? null : constructor.newInstance(aStr);
} catch (Exception ignored) {
aV = null;
}
T bV;
try {
final String bStr;
if (useAttr) {
bStr = b.hasAttributes() ? ((Element) b).getAttribute(attribute) : null;
} else {
bStr = b.getTextContent();
}
bV = bStr == null || bStr.matches("\\s+") ? null : constructor.newInstance(bStr);
} catch (Exception ignored) {
bV = null;
}
final int ret;
if (aV == null && bV == null) {
ret = 0;
} else if (aV == null) {
ret = -1;
} else if (bV == null) {
ret = 1;
} else {
ret = aV.compareTo(bV);
}
return (asc ? 1 : -1) * ret;
};
}
/**
* Method to sort any NodeList by an attribute all nodes must have. <br>If the attribute is absent for a signle
* {@link Node} or the {@link NodeList} does contain elements without Attributes, null is used instead. <br>If
* <code>asc</code> is
* <code>true</code>, nulls first, else nulls last.
*
* @param nodeList The {@link NodeList} containing all {@link Node} to sort.
* @param attribute Name of the attribute to extract and compare
* @param asc <code>true</code>: ascending, <code>false</code>: descending
* @param compareType Optional class to use for comparison. Must implement {@link Comparable} and have Constructor
* that takes a single {@link String} argument. If <code>null</code> is supplied, {@link String} is used.
* @return A collection of the {@link Node}s passed as {@link NodeList}
* @throws RuntimeException If <code>compareType</code> does not have a constructor taking a single {@link String}
* argument. Also, if the comparator created does violate the {@link Comparator} contract, an
* {@link IllegalArgumentException} is raised.
* @implNote Exceptions during calls of the single String argument constructor of <code>compareType</code> are
* ignored. Values are substituted by <code>null</code>
*/
public static <T extends Comparable<T>> Collection<Node> sortNodesByAttribute(
final NodeList nodeList,
String attribute,
boolean asc,
Class<T> compareType) {
final Comparator<Node> nodeComparator;
if (compareType == null) {
nodeComparator = StringNodeAttributeComparatorSupplier.apply(asc, attribute);
} else {
final Constructor<T> constructor;
try {
constructor = compareType.getDeclaredConstructor(String.class);
} catch (NoSuchMethodException e) {
throw new RuntimeException(
"Cannot compare Node Attribute '" + attribute + "' using the Type '" + compareType.getName()
+ "': No Constructor available that takes a single String argument.", e);
}
nodeComparator = nodeComparator(asc, true, attribute, constructor);
}
final List<Node> nodes = new ArrayList<>(nodeListCollection(nodeList));
nodes.sort(nodeComparator);
return nodes;
}
/**
* Method to sort any NodeList by their text content using an optional type. <br>If
* <code>asc</code> is
* <code>true</code>, nulls first, else nulls last.
*
* @param nodeList The {@link NodeList} containing all {@link Node}s to sort.
* @param asc <code>true</code>: ascending, <code>false</code>: descending
* @param compareType Optional class to use for comparison. Must implement {@link Comparable} and have Constructor
* that takes a single {@link String} argument. If <code>null</code> is supplied, {@link String} is used.
* @return A collection of the {@link Node}s passed as {@link NodeList}
* @throws RuntimeException If <code>compareType</code> does not have a constructor taking a single {@link String}
* argument. Also, if the comparator created does violate the {@link Comparator} contract, an
* {@link IllegalArgumentException} is raised.
* @implNote Exceptions during calls of the single String argument constructor of <code>compareType</code> are
* ignored. Values are substituted by <code>null</code>
*/
public static <T extends Comparable<T>> Collection<Node> sortNodes(
final NodeList nodeList,
boolean asc,
Class<T> compareType) {
final Comparator<Node> nodeComparator;
if (compareType == null) {
nodeComparator = StringNodeValueComparatorSupplier.apply(asc);
} else {
final Constructor<T> constructor;
try {
constructor = compareType.getDeclaredConstructor(String.class);
} catch (NoSuchMethodException e) {
throw new RuntimeException(
"Cannot compare Nodes using the Type '" + compareType.getName()
+ "': No Constructor available that takes a single String argument.", e);
}
nodeComparator = nodeComparator(asc, false, null, constructor);
}
final List<Node> nodes = new ArrayList<>(nodeListCollection(nodeList));
nodes.sort(nodeComparator);
return nodes;
}
Upvotes: 0
Reputation: 7230
I think I can find some valid justifications for caring about attribute order:
It seems like Alain Pannetier's solution is the way to go.
Also, you may want to take a look at DecentXML; it gives you full control of how the XML is formatted, even though it's not DOM-compatible. Specially useful if you want to modify some hand-edited XML without losing the formatting.
Upvotes: 4
Reputation: 1988
Kind of works...
package mynewpackage;
// for the method
import java.lang.reflect.Constructor;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
// for the test example
import org.xml.sax.InputSource;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.StringReader;
import org.w3c.dom.Document;
import java.math.BigDecimal;
public class NodeTools {
/**
* Method sorts any NodeList by provided attribute.
* @param nl NodeList to sort
* @param attributeName attribute name to use
* @param asc true - ascending, false - descending
* @param B class must implement Comparable and have Constructor(String) - e.g. Integer.class , BigDecimal.class etc
* @return
*/
public static Node[] sortNodes(NodeList nl, String attributeName, boolean asc, Class<? extends Comparable> B)
{
class NodeComparator<T> implements Comparator<T>
{
@Override
public int compare(T a, T b)
{
int ret;
Comparable bda = null, bdb = null;
try{
Constructor bc = B.getDeclaredConstructor(String.class);
bda = (Comparable)bc.newInstance(((Element)a).getAttribute(attributeName));
bdb = (Comparable)bc.newInstance(((Element)b).getAttribute(attributeName));
}
catch(Exception e)
{
return 0; // yes, ugly, i know :)
}
ret = bda.compareTo(bdb);
return asc ? ret : -ret;
}
}
List<Node> x = new ArrayList<>();
for(int i = 0; i < nl.getLength(); i++)
{
x.add(nl.item(i));
}
Node[] ret = new Node[x.size()];
ret = x.toArray(ret);
Arrays.sort(ret, new NodeComparator<Node>());
return ret;
}
public static void main(String... args)
{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
String s = "<xml><item id=\"1\" price=\"100.00\" /><item id=\"3\" price=\"29.99\" /><item id=\"2\" price=\"5.10\" /></xml>";
Document doc = null;
try
{
builder = factory.newDocumentBuilder();
doc = builder.parse(new InputSource(new StringReader(s)));
}
catch(Exception e) { System.out.println("Alarm "+e); return; }
System.out.println("*** Sort by id ***");
Node[] ret = NodeTools.sortNodes(doc.getElementsByTagName("item"), "id", true, Integer.class);
for(Node n: ret)
{
System.out.println(((Element)n).getAttribute("id")+" : "+((Element)n).getAttribute("price"));
}
System.out.println("*** Sort by price ***");
ret = NodeTools.sortNodes(doc.getElementsByTagName("item"), "price", true, BigDecimal.class);
for(Node n: ret)
{
System.out.println(((Element)n).getAttribute("id")+" : "+((Element)n).getAttribute("price"));
}
}
}
In my simple test it prints:
*** Sort by id ***
1 : 100.00
2 : 5.10
3 : 29.99
*** Sort by price ***
2 : 5.10
3 : 29.99
1 : 100.00
Upvotes: 0
Reputation: 4686
You can still do this using the standard DOM and Transformation API by using a quick and dirty solution like the one I am describing:
We know that the transformation API solution orders the attributes alphabetically. You can prefix the attributes names with some easy-to-strip-later strings so that they will be output in the order you want. Simple prefixes as "a_" "b_" etc should suffice in most situations and can be easily stripped from the output xml using a one liner regex.
If you are loading an xml and resave and want to preserve attributes order, you can use the same principle, by first modifying the attribute names in the input xml text and then parsing it into a Document object. Again, make this modification based on a textual processing of the xml. This can be tricky but can be done by detecting elements and their attributes strings, again, using regex. Note that this is a dirty solution. There are many pitfalls when parsing XML on your own, even for something as simple as this, so be careful if you decide to implement this.
Upvotes: 1
Reputation: 1
I have a quite similar problem. I need to have always the same attribute for first. Example :
<h50row a="1" xidx="1" c="1"></h50row>
<h50row a="2" b="2" xidx="2"></h50row>
must become
<h50row xidx="1" a="1" c="1"></h50row>
<h50row xidx="2" a="2" b="2"></h50row>
I found a solution with a regex:
test = "<h50row a=\"1\" xidx=\"1\" c=\"1\"></h50row>";
test = test.replaceAll("(<h5.*row)(.*)(.xidx=\"\\w*\")([^>]*)(>)", "$1$3$2$4$5");
Hope you find this usefull
Upvotes: -1
Reputation: 352
Sorry to say, but the answer is more subtle than "No you can't" or "Why do you need to do this in the first place ?".
The short answer is "DOM will not allow you to do that, but SAX will".
This is because DOM does not care about the attribute order, since it's meaningless as far as the standard is concerned, and by the time the XSL gets hold of the input stream, the info is already lost.
Most XSL engine will actually gracefully preserve the input stream attribute order (e.g.
Xalan-C (except in one case) or Xalan-J (always)). Especially if you use <xsl:copy*>
.
Cases where the attribute order is not kept, best of my knowledge, are.
- If the input stream is a DOM
- Xalan-C: if you insert your result-tree tags literally (e.g. <elem att1={@att1} .../>
Here is one example with SAX, for the record (inhibiting DTD nagging as well).
SAXParserFactory spf = SAXParserFactoryImpl.newInstance();
spf.setNamespaceAware(true);
spf.setValidating(false);
spf.setFeature("http://xml.org/sax/features/validation", false);
spf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
spf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
SAXParser sp = spf.newSAXParser() ;
Source src = new SAXSource ( sp.getXMLReader(), new InputSource( input.getAbsolutePath() ) ) ;
String resultFileName = input.getAbsolutePath().replaceAll(".xml$", ".cooked.xml" ) ;
Result result = new StreamResult( new File (resultFileName) ) ;
TransformerFactory tf = TransformerFactory.newInstance();
Source xsltSource = new StreamSource( new File ( COOKER_XSL ) );
xsl = tf.newTransformer( xsltSource ) ;
xsl.setParameter( "srcDocumentName", input.getName() ) ;
xsl.setParameter( "srcDocumentPath", input.getAbsolutePath() ) ;
xsl.transform(src, result );
I'd also like to point out, at the intention of many naysayers that there are cases where attribute order does matter.
Regression testing is an obvious case. Whoever has been called to optimise not-so-well written XSL knows that you usually want to make sure that "new" result trees are similar or identical to the "old" ones. And when the result tree are around one million lines, XML diff tools prove too unwieldy... In these cases, preserving attribute order is of great help.
Hope this helps ;-)
Upvotes: 32
Reputation: 276
I had the same exact problem. I wanted to modify XML attributes but wanted to keep the order because of diff. I used StAX to achieve this. You have to use XMLStreamReader and XMLStreamWriter (the Cursor based solution). When you get a START_ELEMENT event type, the cursor keeps the index of the attributes. Hence, you can make appropriate modifications and write them to the output file "in order".
Look at this article/discussion. You can see how to read the attributes of the start elements in order.
Upvotes: 2
Reputation: 11522
Robert Rossney said it well: if you're relying on the ordering of attributes, you're not really processing XML, but rather, something that looks like XML.
I can think of at least two reasons why you might care about attribute ordering. There may be others, but at least for these two I can suggest alternatives:
You're using multiple instances of attributes with the same name:
<foo myAttribute="a" myAttribute="b" myAttribute="c"/>
This is just plain invalid XML; a DOM processor will probably drop all but one of these values – if it processes the document at all. Instead of this, you want to use child elements:
<foo>
<myChild="a"/>
<myChild="b"/>
<myChild="c"/>
</foo>
You're assuming that some sort of distinction applies to the attribute(s) that come first. Make this explicit, either through other attributes or through child elements. For example:
<foo attr1="a" attr2="b" attr3="c" theMostImportantAttribute="attr1" />
Upvotes: 0
Reputation: 113292
XML Canonicalisation results in a consistent attribute ordering, primarily to allow one to check a signature over some or all of the XML, though there are other potential uses. This may suit your purposes.
Upvotes: 8
Reputation: 161783
It's not possible to over-emphasize what Robert Rossney just said, but I'll try. ;-)
The benefit of International Standards is that, when everybody follows them, life is good. All our software gets along peacefully.
XML has to be one of the most important standards we have. It's the basis of "old web" stuff like SOAP, and still 'web 2.0' stuff like RSS and Atom. It's because of clear standards that XML is able to interoperate between different platforms.
If we give up on XML, little by little, we'll get into a situation where a producer of XML will not be able to assume that a consumer of XML will be able to consumer their content. This would have a disasterous affect on the industry.
We should push back very forcefully, on anyone who writes code that does not process XML according to the standard. I understand that, in these economic times, there is a reluctance to offend customers and business partners by saying "no". But in this case, I think it's worth it. We would be in much worse financial shape if we had to hand-craft XML for each business partner.
So, don't "enable" companies who do not understand XML. Send them the standard, with the appropriate lines highlighted. They need to stop thinking that XML is just text with angle brackets in it. It simply does not behave like text with angle brackets in it.
It's not like there's an excuse for this. Even the smallest embedded devices can have full-featured XML parser implementations in them. I have not yet heard a good reason for not being able to parse standard XML, even if one can't afford a fully-featured DOM implementation.
Upvotes: 8
Reputation: 96750
Look at section 3.1 of the XML recommendation. It says, "Note that the order of attribute specifications in a start-tag or empty-element tag is not significant."
If a piece of software requires attributes on an XML element to appear in a specific order, that software is not processing XML, it's processing text that looks superficially like XML. It needs to be fixed.
If it can't be fixed, and you have to produce files that conform to its requirements, you can't reliably use standard XML tools to produce those files. For instance, you might try (as you suggest) to use XSLT to produce attributes in a defined order, e.g.:
<test>
<xsl:attribute name="foo"/>
<xsl:attribute name="bar"/>
<xsl:attribute name="baz"/>
</test>
only to find that the XSLT processor emits this:
<test bar="" baz="" foo=""/>
because the DOM that the processor is using orders attributes alphabetically by tag name. (That's common but not universal behavior among XML DOMs.)
But I want to emphasize something. If a piece of software violates the XML recommendation in one respect, it probably violates it in other respects. If it breaks when you feed it attributes in the wrong order, it probably also breaks if you delimit attributes with single quotes, or if the attribute values contain character entities, or any of a dozen other things that the XML recommendation says that an XML document can do that the author of this software probably didn't think about.
Upvotes: 24
Reputation: 91555
You really shouldn't need to keep any sort of order. As far as I know, no schema takes attribute order into account when validating an XML document either. It sounds like whatever is processing XML on the other end isn't using a proper DOM to parse the results.
I suppose one option would be to manually build up the document using string building, but I strongly recommend against that.
Upvotes: 0