Reputation: 107
The file I'm trying to read is legacy software that is no longer being supported, and I'm trying to pull the data out via the XML format option it gives and port it to a newer version I'm building in Java. The problem I'm having and not finding a solution is that one of the elements has duplicate attributes with different data.
Now I know I could just build my own parser (and I'm afraid I'll have to do that either in part or in whole), but I'd rather not as it's reinventing the wheel for one damnable piece. Can I force it to read around the data? Like change the name of the second one to "attribute1"? Or could I just ignore the second tag? Maybe marry the two pieces of data together like "part1/part2"? The data is not important yet some users might miss it, and the less reason I give to stay with the old system the better.
Ideally I'd like to be able to send data back to the original program for those who don't want to change, so any option that would keep the data the same would be the best.
Thank you for your time.
Upvotes: 3
Views: 1526
Reputation: 5256
TagSoup is the way to go, as already proposed by forty-two, and I'm surprised you didn't get it to work.
This is a link leading to the download: http://ccil.org/~cowan/XML/tagsoup/
And here is a complete example (using JDOM2). The output shows that the first occurence of attribute a
vanished from the result.
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import org.jdom2.Document;
import org.jdom2.input.SAXBuilder;
import org.jdom2.output.XMLOutputter;
public class ParseDuplicateAttributeWithTagSoup
{
public static void main(String[] args) throws Exception
{
String nonWellformed = "<?xml version='1.0' encoding='UTF-8'?><x a='1' a='2'/>";
InputStream is = new ByteArrayInputStream(nonWellformed.getBytes("UTF-8"));
SAXBuilder parser = new SAXBuilder("org.ccil.cowan.tagsoup.Parser");
Document doc = parser.build(is);
new XMLOutputter().output(doc, System.out);
}
}
Upvotes: 1
Reputation: 12817
You can use TagSoup. It has an XMLReader
implementation that will accept almost anything you throw at it. In this case I suspect it will just silently drop one of the attributes.
You can use the XMLReader
as is, together with a JAXP SAXParser
, or with JDOM or DOM4J.
Upvotes: 1
Reputation: 12592
you can use : http://www.jdom.org/docs/apidocs/org/jdom2/Element.html#getAttributes%28%29
And the Attribute object will contain both key and value you are looking for.
Upvotes: -1