Bert Van Nuffelen
Bert Van Nuffelen

Reputation: 33

RDF Sesame querying strategy for the sail implementation

I am looking to understand how the sail-interface of sesame works with respect to the SPARQL update statements (ADD, COPY, ...) and how this is propagated to the actual implementations of that interface.

For instance, it is correct that the SPARQL ADD statement is implemented by the executeAdd method in the SailUpdateExecutor class? See https://bitbucket.org/openrdf/sesame/src/aa0dd3b04738e707c582e4dda14a0a0c5a77ab51/core/repository/sail/src/main/java/org/openrdf/repository/sail/helpers/SailUpdateExecutor.java?at=master&fileviewer=file-view-default .

If this is correct, do I correct interprete that the SAIL layer is triple per triple extracting from the source graph and inserting it into the target graph?

If so, is it possible to overwrite this behaviour for SAIL implementations? For instance, I think this operation can be efficient implemented on the nativeRDF store by native bulk index operations? The generic implementation cannot take any benefit of the internal datastructures and hence the execution will be not optimal.

If this has not been foreseen in the sail interface, is there any of the SESAME query interfaces that applies this strategy: pushing the query as much as possible to the store? Or is the strategy the opposite: when the query can be explored it is done immediately.

Finally, can the query execution strategy be tuned? I find in the source code reference to query exection optimisers, but I do not find if they can be configured per store instance?

feedback is appreciated

Upvotes: 1

Views: 105

Answers (1)

Jeen Broekstra
Jeen Broekstra

Reputation: 22042

it is correct that the SPARQL ADD statement is implemented by the executeAdd method in the SailUpdateExecutor class?

That is correct.

do I correct interprete that the SAIL layer is triple per triple extracting from the source graph and inserting it into the target graph

Yes, that is the default implementation.

However, notice that how the underlying store chooses to actually process the inserted data is not prescribed by this. It can simply add triple by triple as they come in, or it can choose to batch things together in whatever way is most efficient for that particular store. This is why the UpdateContext object is provided along with the actual insertion - it is a marker object that informs the underlying store that these inserts belong to the same overall update operation.

If so, is it possible to overwrite this behaviour for SAIL implementations? For instance, I think this operation can be efficient implemented on the nativeRDF store by native bulk index operations? The generic implementation cannot take any benefit of the internal datastructures and hence the execution will be not optimal.

As said above, the underlying store can do this in the SailConnection method implementations that have an UpdateContext object as one of their parameters.

The SailUpdateExecutor merely executes the update at a logical level. Optimizing its execution is still completely up to the underlying store.

If this has not been foreseen in the sail interface, is there any of the SESAME query interfaces that applies this strategy: pushing the query as much as possible to the store? Or is the strategy the opposite: when the query can be explored it is done immediately.

I'm not quite sure I follow, but queries are always completely pushed to the underlying store. The only thing that happens beforehand is that the query gets parsed and transformed into an algebra model. That algebra model is sent to the underlying store, which has complete freedom to optimize/transform/execute it in whatever way it prefers. Of course, a default evaluation strategy is provided for convenience, but that can be overridden.

Finally, can the query execution strategy be tuned? I find in the source code reference to query exection optimisers, but I do not find if they can be configured per store instance?

Most Sail store instances do not make these optimizers configurable - they simply apply them internally as they see fit. This happens in the SailConnection.evaluate method, typically (or more specifically, in AbstractSailConnection.evaluateInternal). This is also where a store implementation makes a choice for an evaluation strategy to use (either the default, or its own optimized strategy).

Upvotes: 1

Related Questions