Arthur Vaïsse
Arthur Vaïsse

Reputation: 1571

JENA : Create a single Statement based on a String object

I have to parse N-TRIPLE content and apply a modification on every literal of a given type.

For example, I have to modify every WKTLiteral to make them using a referential. Triples such as :

"POINT (0.0 0.0)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>

Must become :

"<http://www.opengis.net/def/crs/EPSG/0/4326> POINT (0.0 0.0)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>

I got each triple line by line into String object and would like to create Jena Statement from this String. My aim is to use the Jena parsers to avoid some dirty String manipulation such as split that are error prone.

For now the only way I find to do this is :

String line = "%a triple is here%";
//Create an empty model
final Model model = ModelFactory.createDefaultModel();
//Parse and store the RDF triple in the model
RDFDataMgr.read(model, new ByteArrayInputStream(line.getBytes(StandardCharsets.UTF_8)), Lang.NTRIPLES);
//Get all the statements - only 1 if any
final StmtIterator listStatements = model.listStatements();
//Got my statement
final Statement statement = listStatements.next();

I also tried to use an RDFReader but don't know to use then a RDFOutputStream... To only get a Statement object automatically created from a String I have to create a Model,use a Reader and an Iterator. It seems to be overkill in my opinion (I cutted off most of the test such as testing that there is effectively a next statement...) .

Do you know a kicker/simpler way to achieve this ?

Arthur.

Upvotes: 0

Views: 1211

Answers (2)

user205512
user205512

Reputation: 8888

You'll find using streams more efficient. StreamRDF instances are sent triples as they are encountered. You can then rewrite as appropriate.

Streams use the SPI level of jena -- nodes, triples and quads rather than statements, resources, etc -- which lack some niceities but for tasks like this they are ideal.

Given what you've written I suspect writing out fixed N-Triples is what you want? Here's an example that will do that. All it does is 1) create a stream to output triples, 2) create a stream that waits for triples, corrects the object (if needed), and writes the results and 3) starts the whole parse going:

final String wkt = "http://www.opengis.net/ont/geosparql#wktLiteral";

// Stream result to stdout
final StreamRDF outputHandler = StreamRDFLib.writer(System.out);

StreamRDF inputHandler = new StreamRDFBase() {
    @Override
    public void triple(Triple triple) { // Got a triple
        Node object = triple.getObject();

        Node transformed;
        // if object is literal and has wkt type
        if (object.isLiteral() &&
                wkt.equals(object.getLiteralDatatypeURI())) {
            // Make a new node, suitably modified
            transformed = NodeFactory.createLiteral(
                    "<http://www.opengis.net/def/crs/EPSG/0/4326> " 
                            + object.getLiteralLexicalForm(), 
                    object.getLiteralDatatype());
        } else { // Do nothing
            transformed = object;
        }

        // Write out with corrected object
        outputHandler.triple(
                Triple.create( triple.getSubject(), triple.getPredicate(),
                        transformed
                        ));
    }
};

// Parse 
RDFDataMgr.parse(inputHandler, RDFDataMgr.open("file"));

Upvotes: 2

Joshua Taylor
Joshua Taylor

Reputation: 85813

I don't know if you'll find a better way than what you've got, really, except that you should probably read in chunks of the file rather than each individual line. If you read in chunks of the file, then you can transform the whole chunk using a simple construct SPARQL query. That will provide a new model, and you can append the N-TRIPLE serialization of that model to your output file (or insert it into a new graph, etc.). Suppose you've got this data:

<urn:ex:a> <urn:ex:p> <urn:ex:b>.
<urn:ex:c> <urn:ex:q> "POINT (0.0 0.0)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>.

Then a query like this will produce the following updated model:

construct { ?s ?p ?oo }
where {

  #-- constant values pulled out for readability; this
  #-- is optional, of course.
  values (?dt ?prefix) {
    (<http://www.opengis.net/ont/geosparql#wktLiteral>
    "<http://www.opengis.net/def/crs/EPSG/0/4326> ") 
  }

  #-- grab each triple, and bind ?oo to ?o if it doesn't 
  #-- need to be updated, or to a new literal, if it does.
  ?s ?p ?o .
  bind( if( isLiteral(?o) && datatype(?o) = ?dt,
            strdt( concat(?prefix,str(?o)), ?dt ),
            ?o )
        as ?oo )
}
<urn:ex:a> <urn:ex:p> <urn:ex:b> .
<urn:ex:c> <urn:ex:q> "<http://www.opengis.net/def/crs/EPSG/0/4326> POINT (0.0 0.0)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> .

If you were to load the whole dataset into a TDB instance, you could transform the whole dataset relatively easily with some variant of this, and then just dump the final data into a new file.

Upvotes: 3

Related Questions