Reputation: 450
I have this SPARQL query that I ran through Wikidata's endpoint
SELECT ?bLabel ?b ?hLabel ?a ?cLabel
WHERE
{
wd:Q11462 ?a ?b.
wd:Q11095 ?a ?b.
?c ?a ?b.
?h wikibase:directClaim ?a .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
Essentially, I'm looking for relationships that is shared by wd:Q11462 and wd:Q11095 and see what else shares the relationship. It hits the 60 seconds time limit.
However, if I run multiple queries in two parts :
First, obtain the shared relationships
SELECT ?bLabel ?b ?hLabel ?a
WHERE
{
wd:Q11462 ?a ?b.
wd:Q11095 ?a ?b.
?h wikibase:directClaim ?a .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
And then, for each obtained relationship, run a query that find what else shares it with them.
"""
SELECT ?cLabel
WHERE
{
?c wdt:P131 wd:Q3586.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
"""
The entire queries run only for 2.5 seconds.
Due to constraints, I wish to able to reach that same speed with only a single query. What should I do?
Upvotes: 1
Views: 118
Reputation: 16384
Here is a an approach that uses a subquery. It takes six seconds:
SELECT ?cLabel
WITH {
SELECT ?bLabel ?b ?hLabel ?a
WHERE {
wd:Q11462 ?a ?b.
wd:Q11095 ?a ?b.
?h wikibase:directClaim ?a .
}
} as %results
WHERE {
INCLUDE %results.
?c wdt:P131 wd:Q3586.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
Subqueries are the natural extension given the stark difference you observed and how close they are conceptually to your approach of running multiple queries consecutively. A more generic trick that often helps is replacing the label service with a manual query for labels.
After switching to some items with fewer (common) statements, I convinced the query service to explain itself. I can't quite claim to understand that output, but as far as I can tell it's the label service that's throwing it off (Row 5 in the table at the bottom):
9 com.bigdata.bop.BOp.bopId
CONTROLLER com.bigdata.bop.BOp.evaluationContext
false com.bigdata.bop.PipelineOp.pipelined
true com.bigdata.bop.PipelineOp.sharedState
ServiceNode com.bigdata.bop.controller.ServiceCallJoin.serviceNode
wdq com.bigdata.bop.controller.ServiceCallJoin.namespace
1596209250127 com.bigdata.bop.controller.ServiceCallJoin.timestamp
[b, h, c] com.bigdata.bop.join.HashJoinAnnotations.joinVars
null com.bigdata.bop.join.JoinAnnotations.constraints
It seems as if it tries to populate labels for 20000+ items at that point. Apart from just leaving it out of the first query, SPARQL offers the ability to add hints as to the ideal sequence of operations, which might be useful here.
Upvotes: 1