Reputation: 107

Trying to read an xml file with duplicate attributes

The file I'm trying to read is legacy software that is no longer being supported, and I'm trying to pull the data out via the XML format option it gives and port it to a newer version I'm building in Java. The problem I'm having and not finding a solution is that one of the elements has duplicate attributes with different data.

Now I know I could just build my own parser (and I'm afraid I'll have to do that either in part or in whole), but I'd rather not as it's reinventing the wheel for one damnable piece. Can I force it to read around the data? Like change the name of the second one to "attribute1"? Or could I just ignore the second tag? Maybe marry the two pieces of data together like "part1/part2"? The data is not important yet some users might miss it, and the less reason I give to stay with the old system the better.

Ideally I'd like to be able to send data back to the original program for those who don't want to change, so any option that would keep the data the same would be the best.

Thank you for your time.

Upvotes: 3

Answers (3)

Gunther

Reputation: 5256

TagSoup is the way to go, as already proposed by forty-two, and I'm surprised you didn't get it to work.

This is a link leading to the download: http://ccil.org/~cowan/XML/tagsoup/

And here is a complete example (using JDOM2). The output shows that the first occurence of attribute a vanished from the result.

import java.io.ByteArrayInputStream;
import java.io.InputStream;
import org.jdom2.Document;
import org.jdom2.input.SAXBuilder;
import org.jdom2.output.XMLOutputter;

public class ParseDuplicateAttributeWithTagSoup
{
  public static void main(String[] args) throws Exception
  {
    String nonWellformed = "<?xml version='1.0' encoding='UTF-8'?><x a='1' a='2'/>";
    InputStream is = new ByteArrayInputStream(nonWellformed.getBytes("UTF-8"));
    SAXBuilder parser = new SAXBuilder("org.ccil.cowan.tagsoup.Parser");
    Document doc = parser.build(is);
    new XMLOutputter().output(doc, System.out);
  }
}

Upvotes: 1

forty-two

Reputation: 12817

You can use TagSoup. It has an XMLReader implementation that will accept almost anything you throw at it. In this case I suspect it will just silently drop one of the attributes.

You can use the XMLReader as is, together with a JAXP SAXParser, or with JDOM or DOM4J.

Upvotes: 1

dinesh707

Reputation: 12592

you can use : http://www.jdom.org/docs/apidocs/org/jdom2/Element.html#getAttributes%28%29

And the Attribute object will contain both key and value you are looking for.

Upvotes: -1

Trying to read an xml file with duplicate attributes

Answers (3)

Related Questions