Reputation: 3663
In all dbpedia pages, e.g.
http://dbpedia.org/page/Ireland
there's a link to a RDF file. In my application I need to analyse the rdf code and run some logic on it. I could rely on the dbpedia SPARQL endpoint, but I prefer to download the rdf code locally and parse it, to have full control over it.
I installed JENA and I'm trying to parse the code and extract for example a property called: "geo:geometry".
I'm trying with:
StringReader sr = new StringReader( node.rdfCode )
Model model = ModelFactory.createDefaultModel()
model.read( sr, null )
How can I query the model to get the info I need?
For example, if I wanted to get the statement:
<rdf:Description rdf:about="http://dbpedia.org/resource/Ireland">
<geo:geometry xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" rdf:datatype="http://www.openlinksw.com/schemas/virtrdf#Geometry">POINT(-7 53)</geo:geometry>
</rdf:Description>
Or
<rdf:Description rdf:about="http://dbpedia.org/resource/Ireland">
<dbpprop:countryLargestCity xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="en">Dublin</dbpprop:countryLargestCity>
</rdf:Description>
What is the right filter?
Many thanks! Mulone
Upvotes: 3
Views: 3526
Reputation: 16525
Once you have the file parsed in a Jena model you can iterate and filter with something like:
//Property to filter the model
Property geoProperty =
model. createProperty("http://www.w3.org/2003/01/geo/wgs84_pos#",
"geometry");
//Iterator based on a Simple selector
StmtIterator iter =
model.listStatements(new SimpleSelector(null, geoProperty, (RDFNode)null));
//Loop to traverse the statements that match the SimpleSelector
while (iter.hasNext()) {
Statement stmt = iter.nextStatement();
System.out.print(stmt.getSubject().toString());
System.out.print(stmt.getPredicate().toString());
System.out.println(stmt.getObject().toString());
}
The SimpleSelector
allows you to pass any (subject,predicate,object) pattern to match statements in the model. In your case if you only care about a specific predicate then first and third parameters of the constructor are null.
Allowing filtering two different properties
To allow more complex filtering you can implement the selects
method in the
SimpleSelector
interface like here:
Property geoProperty = /* like before */;
Property countryLargestCityProperty =
model. createProperty("http://dbpedia.org/property/",
"countryLargestCity");
SimpleSelector selector = new SimpleSelector(null, null, (RDFNode)null) {
public boolean selects(Statement s)
{ return s.getPredicate().equals(geoProperty) ||
s.getPredicate().equals(countryLargestCityProperty) ;}
}
StmtIterator iter = model.listStatements(selector);
while(it.hasNext()) {
/* same as in the previous example */
}
Edit: including a full example
This code includes a full example that works for me.
import com.hp.hpl.jena.util.FileManager;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.SimpleSelector;
import com.hp.hpl.jena.rdf.model.Property;
import com.hp.hpl.jena.rdf.model.RDFNode;
import com.hp.hpl.jena.rdf.model.Literal;
import com.hp.hpl.jena.rdf.model.StmtIterator;
import com.hp.hpl.jena.rdf.model.Statement;
public class TestJena {
public static void main(String[] args) {
FileManager fManager = FileManager.get();
fManager.addLocatorURL();
Model model = fManager.loadModel("http://dbpedia.org/data/Ireland.rdf");
Property geoProperty =
model. createProperty("http://www.w3.org/2003/01/geo/wgs84_pos#",
"geometry");
StmtIterator iter =
model.listStatements(new SimpleSelector(null, geoProperty,(RDFNode) null));
//Loop to traverse the statements that match the SimpleSelector
while (iter.hasNext()) {
Statement stmt = iter.nextStatement();
if (stmt.getObject().isLiteral()) {
Literal obj = (Literal) stmt.getObject();
System.out.println("The geometry predicate value is " +
obj.getString());
}
}
}
}
This full example prints out:
The geometry predicate value is POINT(-7 53)
Notes on Linked Data
http://dbpedia.org/page/Ireland
is the HTML document version of the resource http://dbpedia.org/resource/Ireland
In order to get the RDF you should resolve :
http://dbpedia.org/data/Ireland.rdf
or
http://dbpedia.org/resource/Ireland
+ Accept: application/rdfxml
in the HTTP header.
With curl
it'd be something like:
curl -L -H 'Accept: application/rdf+xml' http://dbpedia.org/resource/Ireland
Upvotes: 5