Reputation: 23
I am trying to parse a POM file using Nokogiri, and want to get the first level child nodes.
My POM file looks something like this:
<project xmlns="some.maven.link">
<parent>
<groupId>parent.jar</groupId>
<artifactId>parent-jar</artifactId>
</parent>
<groupId>child.jar</groupId>
<artifactId>child-jar</artifactId>
</project>
I am trying to fetch the artifactId "child-jar" but the XPath that I am using is possibly incorrect and it's fetching me "parent.jar" as the first occurence.
This is my Ruby code:
@pom = Nokogiri::XML(File.open(file_path))
p @pom.xpath("/project/artifactId", project"=>"http://maven.apache.org/POM/4.0.0")[0].text
I can access the second element but that just would be a hack.
Upvotes: 2
Views: 1819
Reputation: 160551
Your XML sample does not appear to be correct. Simplifying it:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<project>
<parent>
<groupId>parent.jar</groupId>
<artifactId>parent-jar</artifactId>
</parent>
<groupId>child.jar</groupId>
<artifactId>child-jar</artifactId>
</project>
EOT
doc.at('project > artifactId').text # => "child-jar"
Using XPath I'd use:
doc.at('/project/artifactId').text # => "child-jar"
I'd suggest learning the difference between search
, xpath
, css
and their at*
cousins which are all documented in the "Searching a XML/HTML Document" and Node documentation.
In the above example I removed the XML namespace information to simplify things. XML namespaces are useful, but also are irritating and in your example XML you'd broken it by not supplying a valid URL. Fixing the example with:
<project xmlns="http://www.w3.org/1999/xhtml">
I can use:
namespaces = doc.collect_namespaces # => {"xmlns"=>"http://www.w3.org/1999/xhtml"}
doc.at('project > artifactId', namespaces).text # => "child-jar"
or:
doc.at('xmlns|project > xmlns|artifactId').text # => "child-jar"
I prefer and recommend the first because it's more readable and less noisy. Nokogiri's implementation of CSS in selectors helps simplify most selectors. Passing in the collected namespaces in the document simplifies searches, whether you're using CSS or XPath.
These also work:
doc.at('/xmlns:project/xmlns:artifactId').text # => "child-jar"
doc.at('/foo:project/foo:artifactId', {'foo' => "http://www.w3.org/1999/xhtml"}).text # => "child-jar"
Note that the second uses a renamed namespace, which is useful if you're dealing with redundant xmlns
declarations in the document and need to differentiate between them.
Nokogiri's "Namespaces" tutorial is helpful.
Upvotes: 3