How to process the rdf version of a DBpedia page with Jena?

Question

In all dbpedia pages, e.g.

there's a link to a RDF file. In my application I need to analyse the rdf code and run some logic on it. I could rely on the dbpedia SPARQL endpoint, but I prefer to download the rdf code locally and parse it, to have full control over it.

I installed JENA and I'm trying to parse the code and extract for example a property called: "geo:geometry".

I'm trying with:

StringReader sr = new StringReader( node.rdfCode )      
Model model = ModelFactory.createDefaultModel()
model.read( sr, null )

How can I query the model to get the info I need?

For example, if I wanted to get the statement:


POINT(-7 53)

Or


Dublin

What is the right filter?

Many thanks! Mulone

Manuel Salvadores · Accepted Answer

Once you have the file parsed in a Jena model you can iterate and filter with something like:

//Property to filter the model
Property geoProperty = 
    model. createProperty("http://www.w3.org/2003/01/geo/wgs84_pos#",
                          "geometry");

//Iterator based on a Simple selector
StmtIterator iter =
  model.listStatements(new SimpleSelector(null, geoProperty, (RDFNode)null)); 

//Loop to traverse the statements that match the SimpleSelector
while (iter.hasNext()) {
   Statement stmt = iter.nextStatement();
   System.out.print(stmt.getSubject().toString());
   System.out.print(stmt.getPredicate().toString());
   System.out.println(stmt.getObject().toString());

}

The SimpleSelector allows you to pass any (subject,predicate,object) pattern to match statements in the model. In your case if you only care about a specific predicate then first and third parameters of the constructor are null.

Allowing filtering two different properties

To allow more complex filtering you can implement the selects method in the SimpleSelector interface like here:

Property geoProperty = /* like before */;
Property countryLargestCityProperty = 
    model. createProperty("http://dbpedia.org/property/",
                          "countryLargestCity");

SimpleSelector selector = new SimpleSelector(null, null, (RDFNode)null) {
    public boolean selects(Statement s)
        { return s.getPredicate().equals(geoProperty) || 
                 s.getPredicate().equals(countryLargestCityProperty) ;}
}
StmtIterator iter = model.listStatements(selector);
while(it.hasNext()) {
     /* same as in the previous example */
}

Edit: including a full example

This code includes a full example that works for me.

import com.hp.hpl.jena.util.FileManager;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.SimpleSelector;
import com.hp.hpl.jena.rdf.model.Property;
import com.hp.hpl.jena.rdf.model.RDFNode;
import com.hp.hpl.jena.rdf.model.Literal;
import com.hp.hpl.jena.rdf.model.StmtIterator;
import com.hp.hpl.jena.rdf.model.Statement;

public class TestJena {

    public static void main(String[] args) {
        FileManager fManager = FileManager.get();
        fManager.addLocatorURL();
        Model model = fManager.loadModel("http://dbpedia.org/data/Ireland.rdf");

        Property geoProperty = 
        model. createProperty("http://www.w3.org/2003/01/geo/wgs84_pos#",
                                  "geometry");

        StmtIterator iter =
            model.listStatements(new SimpleSelector(null, geoProperty,(RDFNode) null)); 

        //Loop to traverse the statements that match the SimpleSelector
        while (iter.hasNext()) {
            Statement stmt = iter.nextStatement();
            if (stmt.getObject().isLiteral()) {
                Literal obj = (Literal) stmt.getObject();
                System.out.println("The geometry predicate value is " + 
                                                          obj.getString());
            }   
        }   
    }   

}

This full example prints out:

The geometry predicate value is POINT(-7 53)

Notes on Linked Data

http://dbpedia.org/page/Ireland is the HTML document version of the resource http://dbpedia.org/resource/Ireland

In order to get the RDF you should resolve :

http://dbpedia.org/data/Ireland.rdf

or

http://dbpedia.org/resource/Ireland + Accept: application/rdfxml in the HTTP header. With curl it'd be something like:

curl -L -H 'Accept: application/rdf+xml' http://dbpedia.org/resource/Ireland

How to process the rdf version of a DBpedia page with Jena?

Answers (1)

Related Questions