Reputation: 16037
Can the Java streaming XML parser, i.e. javax.xml.stream.XMLEventReader distinguish an empty element
<document>
<empty></empty>
<document>
from a self-closing empty element?
<document>
<empty/>
<document>
Let's suppose we parse both of the above xml fragments and print the eventType and the event itself, just like this:
System.out.println("eventType:" + event.getEventType() + "; element:"+event.toString());
Both of the above fragments will produce the exact same result:
eventType:7; element:<?xml version="null" encoding='null' standalone='no'?>
eventType:1; element:<document>
eventType:4; element:
eventType:1; element:<empty>
eventType:2; element:</empty>
eventType:2; element:</document>
eventType:8; element:ENDDOCUMENT
Just to give some context, what we want to achieve is, we want to rewrite some parts of the xml based on some rules, but want to preserve other parts exactly as they are, that is, we want to keep empty elements in their original form, even though the two forms are semantically the same. If we have a normal empty element (1st example), we want to keep it that way, if we have a self-closing empty element, we want to write a self-closing element in the result as well. Can we achieve this goal with javax.xml.stream.XMLEventReader?
Upvotes: 1
Views: 1186
Reputation: 5568
You could test if the startevent and endevent have the same location
event.getLocation().getCharacterOffset();
From the javadoc
Return the byte or character offset into the input source this location is pointing to. If the input source is a file or a byte stream then this is the byte offset into that stream, but if the input source is a character media then the offset is the character offset. Returns -1 if there is no offset available.
The offset is not guaranteed to be available, but that should depend on your setup and worth a try if it works in yours. (Also it can only represent offsets up to Integer.MAX_VALUE
)
Upvotes: 1
Reputation: 163468
The answer is no. Similarly, you can't preserve whitespace within a tag (e.g. newlines between attribute values, or spaces around the "=" sign). These are considered to be of no interest to applications, and are therefore not reported.
Upvotes: 0