Reputation: 33
I am looking to understand how the sail-interface of sesame works with respect to the SPARQL update statements (ADD, COPY, ...) and how this is propagated to the actual implementations of that interface.
For instance, it is correct that the SPARQL ADD statement is implemented by the executeAdd method in the SailUpdateExecutor class? See https://bitbucket.org/openrdf/sesame/src/aa0dd3b04738e707c582e4dda14a0a0c5a77ab51/core/repository/sail/src/main/java/org/openrdf/repository/sail/helpers/SailUpdateExecutor.java?at=master&fileviewer=file-view-default .
If this is correct, do I correct interprete that the SAIL layer is triple per triple extracting from the source graph and inserting it into the target graph?
If so, is it possible to overwrite this behaviour for SAIL implementations? For instance, I think this operation can be efficient implemented on the nativeRDF store by native bulk index operations? The generic implementation cannot take any benefit of the internal datastructures and hence the execution will be not optimal.
If this has not been foreseen in the sail interface, is there any of the SESAME query interfaces that applies this strategy: pushing the query as much as possible to the store? Or is the strategy the opposite: when the query can be explored it is done immediately.
Finally, can the query execution strategy be tuned? I find in the source code reference to query exection optimisers, but I do not find if they can be configured per store instance?
feedback is appreciated
Upvotes: 1
Views: 105
Reputation: 22042
it is correct that the SPARQL ADD statement is implemented by the executeAdd method in the SailUpdateExecutor class?
That is correct.
do I correct interprete that the SAIL layer is triple per triple extracting from the source graph and inserting it into the target graph
Yes, that is the default implementation.
However, notice that how the underlying store chooses to actually process the inserted data is not prescribed by this. It can simply add triple by triple as they come in, or it can choose to batch things together in whatever way is most efficient for that particular store. This is why the UpdateContext
object is provided along with the actual insertion - it is a marker object that informs the underlying store that these inserts belong to the same overall update operation.
If so, is it possible to overwrite this behaviour for SAIL implementations? For instance, I think this operation can be efficient implemented on the nativeRDF store by native bulk index operations? The generic implementation cannot take any benefit of the internal datastructures and hence the execution will be not optimal.
As said above, the underlying store can do this in the SailConnection
method implementations that have an UpdateContext
object as one of their parameters.
The SailUpdateExecutor
merely executes the update at a logical level. Optimizing its execution is still completely up to the underlying store.
If this has not been foreseen in the sail interface, is there any of the SESAME query interfaces that applies this strategy: pushing the query as much as possible to the store? Or is the strategy the opposite: when the query can be explored it is done immediately.
I'm not quite sure I follow, but queries are always completely pushed to the underlying store. The only thing that happens beforehand is that the query gets parsed and transformed into an algebra model. That algebra model is sent to the underlying store, which has complete freedom to optimize/transform/execute it in whatever way it prefers. Of course, a default evaluation strategy is provided for convenience, but that can be overridden.
Finally, can the query execution strategy be tuned? I find in the source code reference to query exection optimisers, but I do not find if they can be configured per store instance?
Most Sail store instances do not make these optimizers configurable - they simply apply them internally as they see fit. This happens in the SailConnection.evaluate
method, typically (or more specifically, in AbstractSailConnection.evaluateInternal
). This is also where a store implementation makes a choice for an evaluation strategy to use (either the default, or its own optimized strategy).
Upvotes: 1