Ankur
Ankur

Reputation: 51100

Java n-triple RDF parsing

I want to parse an RDF file which is in n-triple form.

I can write my own parser but I would rather use a library, and Jena seems unecessarily complicated for this purpose (or at least I can't see their docs explaining how to read n-triples in a sensible way).

Could you please either point me to any useful libraries or if you know either Sesame or Jena well, you might know something about how they can solve this.

Upvotes: 12

Views: 9275

Answers (3)

Jeen Broekstra
Jeen Broekstra

Reputation: 22042

Old question, but since you explicitly ask about different libraries, I'd thought I'd show how to do simple RDF parsing with Eclipse RDF4J's Rio parser (disclosure: I am one of the RDF4J developers).

For example, to parse the file and put all triples in a Model, just do this:

FileInputStream in = new FileInputStream("/path/to/file.nt");

Model m = Rio.parse(in, RDFFormat.NTRIPLES);

If you want to immediately print the parser output to stdout (for example in Turtle format), do something like this:

FileInputStream in = new FileInputStream("/path/to/file.nt");

RDFParser parser = Rio.createParser(RDFFormat.NTRIPLES);
parser.parse(in, "", Rio.createWriter(RDFFormat.TURTLE, System.out));

And of course there are more ways to play with these basic tools, have a look at the toolkit documentation for details.

The Rio parsers are available as separate maven artifacts by the way, so if you wish to use only the parsers, without the rest of the RDF4J tools, you can do so.

Upvotes: 5

MarcoS
MarcoS

Reputation: 13564

With Jena it is not so difficult:

Given a file rdfexample.ntriple containing the following RDF in N-TRIPLE form (example taken from here):

<http://www.recshop.fake/cd/Hide your heart> <http://www.recshop.fake/cd#year> "1988" .
<http://www.recshop.fake/cd/Hide your heart> <http://www.recshop.fake/cd#price> "9.90" .
<http://www.recshop.fake/cd/Hide your heart> <http://www.recshop.fake/cd#company> "CBS Records" .
<http://www.recshop.fake/cd/Hide your heart> <http://www.recshop.fake/cd#country> "UK" .
<http://www.recshop.fake/cd/Hide your heart> <http://www.recshop.fake/cd#artist> "Bonnie Tyler" .
<http://www.recshop.fake/cd/Empire Burlesque> <http://www.recshop.fake/cd#year> "1985" .
<http://www.recshop.fake/cd/Empire Burlesque> <http://www.recshop.fake/cd#price> "10.90" .
<http://www.recshop.fake/cd/Empire Burlesque> <http://www.recshop.fake/cd#company> "Columbia" .
<http://www.recshop.fake/cd/Empire Burlesque> <http://www.recshop.fake/cd#country> "USA" .
<http://www.recshop.fake/cd/Empire Burlesque> <http://www.recshop.fake/cd#artist> "Bob Dylan" .

the following code

public static void main(String[] args) {
    String fileNameOrUri = "src/a/rdfexample.ntriple";
    Model model = ModelFactory.createDefaultModel();
    InputStream is = FileManager.get().open(fileNameOrUri);
    if (is != null) {
        model.read(is, null, "N-TRIPLE");
        model.write(System.out, "TURTLE");
    } else {
        System.err.println("cannot read " + fileNameOrUri);;
    }
}

reads the file, and prints it out in TURTLE form:

<http://www.recshop.fake/cd/Hide your heart>
      <http://www.recshop.fake/cd#artist>
              "Bonnie Tyler" ;
      <http://www.recshop.fake/cd#company>
              "CBS Records" ;
      <http://www.recshop.fake/cd#country>
              "UK" ;
      <http://www.recshop.fake/cd#price>
              "9.90" ;
      <http://www.recshop.fake/cd#year>
              "1988" .

<http://www.recshop.fake/cd/Empire Burlesque>
      <http://www.recshop.fake/cd#artist>
              "Bob Dylan" ;
      <http://www.recshop.fake/cd#company>
              "Columbia" ;
      <http://www.recshop.fake/cd#country>
              "USA" ;
      <http://www.recshop.fake/cd#price>
              "10.90" ;
      <http://www.recshop.fake/cd#year>
              "1985" .

So, with Jena you can easily parse RDF (in any form) into a com.hp.hpl.jena.rdf.model.Model object, which allows you to programmatically manipulate it.

Upvotes: 8

RobV
RobV

Reputation: 28646

If you just want to parse the NTriples and don't need to do anything other than basic processing and querying then you could try the NxParser. It is a very simple bit of Java code that'll pass any NTriples like format (so NQuads etc) which gives you an iterator over the statements in the file. If you only want NTriples you can easily ignore statements with less/more than 3 items.

Adapting the example on the linked page would give the following simple code:

NxParser nxp = new NxParser(new FileInputStream("filetoparse.nq"),false);

while (nxp.hasNext()) 
{
  Node[] ns = nxp.next();
  if (ns.length == 3)
  {
    //Only Process Triples  
    //Replace the print statements with whatever you want
    for (Node n: ns) 
    {
      System.out.print(n.toN3());
      System.out.print(" ");
    }
    System.out.println(".");
  }
}

Upvotes: 7

Related Questions