Mulone
Mulone

Reputation: 3663

How to process the rdf version of a DBpedia page with Jena?

In all dbpedia pages, e.g.

http://dbpedia.org/page/Ireland

there's a link to a RDF file. In my application I need to analyse the rdf code and run some logic on it. I could rely on the dbpedia SPARQL endpoint, but I prefer to download the rdf code locally and parse it, to have full control over it.

I installed JENA and I'm trying to parse the code and extract for example a property called: "geo:geometry".

I'm trying with:

StringReader sr = new StringReader( node.rdfCode )      
Model model = ModelFactory.createDefaultModel()
model.read( sr, null )

How can I query the model to get the info I need?

For example, if I wanted to get the statement:

<rdf:Description rdf:about="http://dbpedia.org/resource/Ireland">
<geo:geometry xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" rdf:datatype="http://www.openlinksw.com/schemas/virtrdf#Geometry">POINT(-7 53)</geo:geometry>
</rdf:Description>

Or

<rdf:Description rdf:about="http://dbpedia.org/resource/Ireland">
<dbpprop:countryLargestCity xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="en">Dublin</dbpprop:countryLargestCity>
</rdf:Description>

What is the right filter?

Many thanks! Mulone

Upvotes: 3

Views: 3526

Answers (1)

Manuel Salvadores
Manuel Salvadores

Reputation: 16525

Once you have the file parsed in a Jena model you can iterate and filter with something like:

//Property to filter the model
Property geoProperty = 
    model. createProperty("http://www.w3.org/2003/01/geo/wgs84_pos#",
                          "geometry");

//Iterator based on a Simple selector
StmtIterator iter =
  model.listStatements(new SimpleSelector(null, geoProperty, (RDFNode)null)); 

//Loop to traverse the statements that match the SimpleSelector
while (iter.hasNext()) {
   Statement stmt = iter.nextStatement();
   System.out.print(stmt.getSubject().toString());
   System.out.print(stmt.getPredicate().toString());
   System.out.println(stmt.getObject().toString());

}

The SimpleSelector allows you to pass any (subject,predicate,object) pattern to match statements in the model. In your case if you only care about a specific predicate then first and third parameters of the constructor are null.

Allowing filtering two different properties

To allow more complex filtering you can implement the selects method in the SimpleSelector interface like here:

Property geoProperty = /* like before */;
Property countryLargestCityProperty = 
    model. createProperty("http://dbpedia.org/property/",
                          "countryLargestCity");

SimpleSelector selector = new SimpleSelector(null, null, (RDFNode)null) {
    public boolean selects(Statement s)
        { return s.getPredicate().equals(geoProperty) || 
                 s.getPredicate().equals(countryLargestCityProperty) ;}
}
StmtIterator iter = model.listStatements(selector);
while(it.hasNext()) {
     /* same as in the previous example */
}

Edit: including a full example

This code includes a full example that works for me.

import com.hp.hpl.jena.util.FileManager;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.SimpleSelector;
import com.hp.hpl.jena.rdf.model.Property;
import com.hp.hpl.jena.rdf.model.RDFNode;
import com.hp.hpl.jena.rdf.model.Literal;
import com.hp.hpl.jena.rdf.model.StmtIterator;
import com.hp.hpl.jena.rdf.model.Statement;

public class TestJena {

    public static void main(String[] args) {
        FileManager fManager = FileManager.get();
        fManager.addLocatorURL();
        Model model = fManager.loadModel("http://dbpedia.org/data/Ireland.rdf");

        Property geoProperty = 
        model. createProperty("http://www.w3.org/2003/01/geo/wgs84_pos#",
                                  "geometry");

        StmtIterator iter =
            model.listStatements(new SimpleSelector(null, geoProperty,(RDFNode) null)); 

        //Loop to traverse the statements that match the SimpleSelector
        while (iter.hasNext()) {
            Statement stmt = iter.nextStatement();
            if (stmt.getObject().isLiteral()) {
                Literal obj = (Literal) stmt.getObject();
                System.out.println("The geometry predicate value is " + 
                                                          obj.getString());
            }   
        }   
    }   

}

This full example prints out:

The geometry predicate value is POINT(-7 53)

Notes on Linked Data

http://dbpedia.org/page/Ireland is the HTML document version of the resource http://dbpedia.org/resource/Ireland

In order to get the RDF you should resolve :

http://dbpedia.org/data/Ireland.rdf

or

http://dbpedia.org/resource/Ireland + Accept: application/rdfxml in the HTTP header. With curl it'd be something like:

curl -L -H 'Accept: application/rdf+xml' http://dbpedia.org/resource/Ireland

Upvotes: 5

Related Questions