Reputation: 1571
I have to parse N-TRIPLE content and apply a modification on every literal of a given type.
For example, I have to modify every WKTLiteral to make them using a referential. Triples such as :
"POINT (0.0 0.0)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>
Must become :
"<http://www.opengis.net/def/crs/EPSG/0/4326> POINT (0.0 0.0)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>
I got each triple line by line into String object and would like to create Jena Statement from this String. My aim is to use the Jena parsers to avoid some dirty String manipulation such as split that are error prone.
For now the only way I find to do this is :
String line = "%a triple is here%";
//Create an empty model
final Model model = ModelFactory.createDefaultModel();
//Parse and store the RDF triple in the model
RDFDataMgr.read(model, new ByteArrayInputStream(line.getBytes(StandardCharsets.UTF_8)), Lang.NTRIPLES);
//Get all the statements - only 1 if any
final StmtIterator listStatements = model.listStatements();
//Got my statement
final Statement statement = listStatements.next();
I also tried to use an RDFReader but don't know to use then a RDFOutputStream... To only get a Statement object automatically created from a String I have to create a Model,use a Reader and an Iterator. It seems to be overkill in my opinion (I cutted off most of the test such as testing that there is effectively a next statement...) .
Do you know a kicker/simpler way to achieve this ?
Arthur.
Upvotes: 0
Views: 1211
Reputation: 8888
You'll find using streams more efficient. StreamRDF
instances are sent triples as they are encountered. You can then rewrite as appropriate.
Streams use the SPI level of jena -- nodes, triples and quads rather than statements, resources, etc -- which lack some niceities but for tasks like this they are ideal.
Given what you've written I suspect writing out fixed N-Triples is what you want? Here's an example that will do that. All it does is 1) create a stream to output triples, 2) create a stream that waits for triples, corrects the object (if needed), and writes the results and 3) starts the whole parse going:
final String wkt = "http://www.opengis.net/ont/geosparql#wktLiteral";
// Stream result to stdout
final StreamRDF outputHandler = StreamRDFLib.writer(System.out);
StreamRDF inputHandler = new StreamRDFBase() {
@Override
public void triple(Triple triple) { // Got a triple
Node object = triple.getObject();
Node transformed;
// if object is literal and has wkt type
if (object.isLiteral() &&
wkt.equals(object.getLiteralDatatypeURI())) {
// Make a new node, suitably modified
transformed = NodeFactory.createLiteral(
"<http://www.opengis.net/def/crs/EPSG/0/4326> "
+ object.getLiteralLexicalForm(),
object.getLiteralDatatype());
} else { // Do nothing
transformed = object;
}
// Write out with corrected object
outputHandler.triple(
Triple.create( triple.getSubject(), triple.getPredicate(),
transformed
));
}
};
// Parse
RDFDataMgr.parse(inputHandler, RDFDataMgr.open("file"));
Upvotes: 2
Reputation: 85813
I don't know if you'll find a better way than what you've got, really, except that you should probably read in chunks of the file rather than each individual line. If you read in chunks of the file, then you can transform the whole chunk using a simple construct SPARQL query. That will provide a new model, and you can append the N-TRIPLE serialization of that model to your output file (or insert it into a new graph, etc.). Suppose you've got this data:
<urn:ex:a> <urn:ex:p> <urn:ex:b>.
<urn:ex:c> <urn:ex:q> "POINT (0.0 0.0)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>.
Then a query like this will produce the following updated model:
construct { ?s ?p ?oo }
where {
#-- constant values pulled out for readability; this
#-- is optional, of course.
values (?dt ?prefix) {
(<http://www.opengis.net/ont/geosparql#wktLiteral>
"<http://www.opengis.net/def/crs/EPSG/0/4326> ")
}
#-- grab each triple, and bind ?oo to ?o if it doesn't
#-- need to be updated, or to a new literal, if it does.
?s ?p ?o .
bind( if( isLiteral(?o) && datatype(?o) = ?dt,
strdt( concat(?prefix,str(?o)), ?dt ),
?o )
as ?oo )
}
<urn:ex:a> <urn:ex:p> <urn:ex:b> .
<urn:ex:c> <urn:ex:q> "<http://www.opengis.net/def/crs/EPSG/0/4326> POINT (0.0 0.0)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> .
If you were to load the whole dataset into a TDB instance, you could transform the whole dataset relatively easily with some variant of this, and then just dump the final data into a new file.
Upvotes: 3