Reputation: 11865
I am playing with Blazegraph. I insert some triples representing 'events', each of 'event' contains 3 triples and looks like this:
<%event-iri%> <http://predicates/timestamp> '2020-01-02T03:04:05.000Z'^^xsd:dateTime .
<%event-iri%> <http://predicates/a> %RANDOM_UUID% .
<%event-iri%> <http://predicates/b> %RANDOM_UUID% .
Timestamps represent consecutive moments of time, each next event is 1 minute later than the previous one.
I have 10 million 'events' (so 30 million triples) in the graph.
I run the following query:
select ?event
where {
?event <http://predicates/timestamp> ?timestamp .
}
order by ?timestamp
limit 15
I expect it to be executed effectively using POS index for ordering, but it looks like it attempts to do a sort of all 10 million timestamps in memory as the query takes more than 2 minutes to execute.
Adding a hint saying that a range query is safe does not help either (well, the hint is about range queries, but I decided to still try it as it seems to actually be about 'all objects of this predicate are of the same type'):
select ?event
where {
?event <http://predicates/timestamp> ?timestamp .
hint:Prior hint:rangeSafe true .
}
order by ?timestamp
limit 15
Still same 2 minutes and 20 seconds.
In relational databases and MongoDB you just add an index and such queries work fast.
Is there a way to execute such a query efficiently on Blazegraph?
Upvotes: 0
Views: 244
Reputation: 11479
From Query Hints:
Blazegraph stores data in their natural (lexicographic) order in the POS(C) index and the OSP(C) index.
That order is not preserved through joins, so one can try something like this:
SELECT ?event ?timestamp {
{
hint:Query hint:optimizer "None"
SELECT ?event ?timestamp {
?event <http://predicates/timestamp> ?timestamp .
# prevent "random" sampling
hint:Query hint:pipelinedHashJoin false
} LIMIT 200
}
?event <http://predicates/c> :c
} ORDER BY ?timestamp LIMIT 150
However, my experiments show that result constancy is not guaranteed even using subtler hints. And obviously, this approach can not be extended to DESC
ordering.
If events are so regular, you could estimate an upper bound yourself:
SELECT ?event {
VALUES ?startTime {"2000-11-26T12:00:00"^^xsd:dateTime}
BIND (STRDT(CONCAT("PT", STR(200), "M"), xsd:duration) AS ?duration)
BIND (?startTime + ?duration AS ?endTime)
?event <http://predicates/timestamp> ?timestamp .
hint:Prior hint:rangeSafe true
FILTER (?timestamp < ?endTime)
?event <http://predicates/c> :c
} ORDER BY ?timestamp LIMIT 150
Upvotes: 2