Richard Smith
Richard Smith

Reputation: 3475

Graph restrictions to reduce dataset in SPARQL

I am writing a SPARQL query for a dataset containing thousands of graphs, and I want to compute on the fly which graphs are to be included my search. As an example, I might only want to include graphs written by me this year. This can be done easily enough by a restriction on the graph URI when the query is straightforward:

SELECT ?name WHERE {
  GRAPH ?g { [ foaf:name ?name ] }
  ?g dc:creator ex:me ; dc:created ?date
  FILTER( xsd:dateTime(?date) >= xsd:dateTime("2015-01-01") )
}

But suppose my query is more complex and I want a list of pairs of acquaintances. The naïve implementation is something like this:

SELECT ?name WHERE {
  GRAPH ?g { ?name1 ^foaf:name/foaf:knows/foaf:name ?name2 }
  ?g dc:creator ex:me ; dc:created ?date
  FILTER( xsd:dateTime(?date) >= xsd:dateTime("2015-01-01") )
}

This works fine if all three FOAF triples are in the same graph. But if any is in a different graph, it fails because ?g binds to a single graph in each result. I can explicitly write each of the three FOAF triples in their own GRAPH block, but then I have to associate each with their own graph URI variable, and repeat the graph restriction for each:

SELECT ?name WHERE {
  GRAPH ?g1 { ?p1 foaf:name ?name1 }
  ?g1 dc:creator ex:me ; dc:created ?date1
  FILTER( xsd:dateTime(?date1) >= xsd:dateTime("2015-01-01") )

  GRAPH ?g2 { ?p1 foaf:knows ?p2 }
  ?g2 dc:creator ex:me ; dc:created ?date2
  FILTER( xsd:dateTime(?date2) >= xsd:dateTime("2015-01-01") )

  GRAPH ?g3 { ?p2 foaf:name ?name2 }
  ?g3 dc:creator ex:me ; dc:created ?date3
  FILTER( xsd:dateTime(?date3) >= xsd:dateTime("2015-01-01") )
}

That code now does the right thing, but it rapidly becomes untenable as the query becomes more complex. If the main query has m triples and the graph restriction has n, the complete query ends up with m×n triples.

Is there a better solution within standard SPARQL 1.1? I'm aware that some SPARQL engines will fetch a graph from its URI, and then you can make that URI the URL of a GET request to a SPARQL endpoint, but that's not standard. I had hoped the federated query mechanism might help, but it doesn't seem to.

Upvotes: 1

Views: 122

Answers (1)

ColinMaudry
ColinMaudry

Reputation: 143

As I don't have your data at hand, I can't test my query reliably. However, you seem to need subqueries.

Subqueries are a way to embed SPARQL queries within other queries, normally to achieve results which cannot otherwise be achieved, such as limiting the number of results from some sub-expression within the query.

In your case, the objective is:

  1. Getting the list of graphs for which you are the dc:creator
  2. For each of those graphs, find some possibly useful triples

What you want as a result of your query isn't very clear, since you mention only ?name after SELECT... although it's nowhere in the rest of the query. Here is how you could try using subqueries and maybe find inspiration to solve your problem:

SELECT ?name1 ?name2 WHERE {
  {GRAPH ?graph { ?p1 foaf:name ?name1 }}
  UNION {GRAPH ?graph { ?p1 foaf:knows ?p2 }}
  UNION {GRAPH ?graph { ?p2 foaf:name ?name2 }}
  { select ?graph where { #here we get the list of graphs you created
    ?graph dc:creator ex:me ; dc:created ?date
  FILTER( xsd:dateTime(?date) >= xsd:dateTime("2015-01-01") )
    }  
  }
}

Upvotes: 1

Related Questions