Reputation: 19
I am trying to parse following XML using Java:
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
</book>
<book id="bk109">
<author>Kress, Peter</author>
<title>Paradox Lost</title>
<genre>Science Fiction</genre>
<price>6.95</price>
<publish_date>2006-11-02</publish_date>
</book>
<book id="bk110">
<author>O'Brien, Tim</author>
<title>Microsoft .NET: The Programming Bible</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2006-12-09</publish_date>
</book>
<book id="bk112">
<author>Galos, Mike</author>
<title>Visual Studio 7: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>49.95</price>
<publish_date>2008-04-16</publish_date>
</book>
</catalog>
But what I need is to show all the books whose price is greater than 10 and were published after 2005. I have something like:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder()
Document document = builder.parse(new File("books.xml"));
document.getDocumentElement().normalize();
NodeList bookList = document.getElementsByTagName("book");
for(int i = 0; i <bookList.getLength(); i++) {
Node book1 = bookList.item(i);
if(book1.getNodeType() == Node.ELEMENT_NODE) {
Element bookElement = (Element) book1;
System.out.println("Book " +bookElement.getAttribute("id"));
System.out.println("Author : " +bookElement.getElementsByTagName("author").item(0).getTextContent());
//...
}
}
Upvotes: 0
Views: 218
Reputation: 3965
You can try this step by step:
1 - I took the xml
sample as an example:
String source =
"<?xml version=\"1.0\"?>" +
"<catalog>\n" +
" <book id=\"bk101\">\n" +
" <author>Gambardella, Matthew</author>\n" +
" <title>XML Developer's Guide</title>\n" +
" <genre>Computer</genre>\n" +
" <price>44.95</price>\n" +
" <publish_date>2000-10-01</publish_date>\n" +
" </book>\n" +
" <book id=\"bk109\">\n" +
" <author>Kress, Peter</author>\n" +
" <title>Paradox Lost</title>\n" +
" <genre>Science Fiction</genre>\n" +
" <price>6.95</price>\n" +
" <publish_date>2006-11-02</publish_date>\n" +
" </book>\n" +
" <book id=\"bk110\">\n" +
" <author>O'Brien, Tim</author>\n" +
" <title>Microsoft .NET: The Programming Bible</title>\n" +
" <genre>Computer</genre>\n" +
" <price>36.95</price>\n" +
" <publish_date>2006-12-09</publish_date>\n" +
" </book>\n" +
" <book id=\"bk112\">\n" +
" <author>Galos, Mike</author>\n" +
" <title>Visual Studio 7: A Comprehensive Guide</title>\n" +
" <genre>Computer</genre>\n" +
" <price>49.95</price>\n" +
" <publish_date>2008-04-16</publish_date>\n" +
" </book>\n" +
"</catalog>";
2- We are converting the xml
into a document:
Note: you might be reading a file, then you can use the documentBuilder.parse(new File("filename.xml"))
code.
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
InputSource inputSource = new InputSource(new StringReader(source));
Document document = documentBuilder.parse(inputSource);
3 - We add our Xpath
expression to search the xml document:
Note: Here's how you can do it using @Martin Honnen
's expression.
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
String xpathExpression = "//catalog//book[price > 10 and number(substring(publish_date, 1, 4)) > 2005]";
XPathExpression xPathExpression = xpath.compile(xpathExpression);
NodeList nodes = (NodeList) xPathExpression.evaluate(document, XPathConstants.NODESET);
4 - We extract the information I want from the books we filter:
Note: Iterate over all children and nodes.item(i).getNodeType() == Node.ELEMENT_NODE
is used to filter text nodes out. If there is nothing else in XML what remains are staff nodes.
for (int i = 0; i < nodes.getLength(); i++) {
if (nodes.item(i).getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) nodes.item(i);
String author = element.getElementsByTagName("author")
.item(0).getTextContent();
String title = element.getElementsByTagName("title")
.item(0).getTextContent();
String genre = element.getElementsByTagName("genre")
.item(0).getTextContent();
String price = element.getElementsByTagName("price")
.item(0).getTextContent();
String publish_date = element.getElementsByTagName("publish_date")
.item(0).getTextContent();
System.out.println(String.format(
"[author=%s, title=%s, genre=%s, price=%s, publish_date=%s]",
author, title, genre, price, publish_date));
}
}
5 - The output will be like this:
[author=O'Brien, Tim, title=Microsoft .NET: The Programming Bible, genre=Computer, price=36.95, publish_date=2006-12-09]
[author=Galos, Mike, title=Visual Studio 7: A Comprehensive Guide, genre=Computer, price=49.95, publish_date=2008-04-16]
Process finished with exit code 0
The working code is here https://ideone.com/mLPwrf
Upvotes: 1
Reputation: 163448
You can call up an XSLT processor from Java, either the built-in XSLT 1.0 processor, or Saxon which offers XSLT 3.0. In XSLT 3.0 the stylesheet to do this is (untested):
<xsl:transform
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="3.0" expand-text="yes">
<xsl:output method="text"/>
<xsl:mode on-no-match="shallow-skip"/>
<xsl:variable name="NL" select="'
'"/>
<xsl:template match="/">
<xsl:apply-templates
select="//book[price>10 and year-from-date(xs:date(publish_date)>2005]"/>
</xsl:template>
<xsl:template match="Book">
<xsl:text>Book: {@id}{$NL}</xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Title"> Title: {.}{$NL}</xsl:template>
<xsl:template match="Author"> Author: {.}{$NL}</xsl:template>
</xsl:transform>
Upvotes: 0