user411103
user411103

Reputation:

Parse content from a specific tag in XML file (Java)

I have an XML file as per below, and I need to generate a .txt file with the plain text in the tag, each one in a row, using Java.

I read that I could use SAX in order to access the different labels, but in this case, where there can be random tags inside the like in the example below, this is not valid.

What is the best approach to do this? Regex perhaps?

<?xml version="1.0" encoding="utf-8"?>
[...]
<source>
  <g id="_0">
    <g id="_1">First valid sentence</g>
  </g>
</source>
<source>Another valid string</source>

The output results.txt should be something like this:

First valid sentence
Another valid string

Upvotes: 4

Views: 1185

Answers (1)

Birei
Birei

Reputation: 36252

You can use the joox library to parse xml data. Using its find() method you can get all <source> elements, and then use getTextContent() to extract its text, like:

import java.io.File;
import java.io.IOException;
import org.xml.sax.SAXException;
import static org.joox.JOOX.$;

public class Main {

    public static void main(String[] args) throws SAXException, IOException {
        $(new File(args[0]))
            .find("source")
            .forEach(elem -> System.out.println(elem.getTextContent().trim()));

    }
}

I will assume a well-formed xml file, like:

<?xml version="1.0" encoding="utf-8"?>
<root>
    <source>
        <g id="_0">
            <g id="_1">First valid sentence</g>
        </g>
    </source>
    <source>Another valid string</source>
</root>

And it yields:

First valid sentence
Another valid string

Upvotes: 1

Related Questions