Query separate lists of distinct variables in SPARQL?

Question

Say I have a query like this:

WHERE {
 up:classifiedWith ?annotation .
?protein up:classifiedWith ?annotation .
     up:annotation ?O3OET.
    ?O3OET a up:Topological_Domain_Annotation;
         rdfs:comment ?topology;
         up:range ?Q02UJ .
    ?protein a up:Protein .
    ?protein up:annotation ?otherTop .
    ?otherTop a up:Topological_Domain_Annotation;
             rdfs:comment ?topology;
             up:range ?OTHERRANGE .
     up:annotation ?S7IK0.
    ?S7IK0 a up:Pathway_Annotation ;
          rdfs:seeAlso ?pathway .
    ?protein a up:Protein .
    ?protein up:annotation ?VAR2 .
    ?VAR2 a up:Pathway_Annotation ;
          rdfs:seeAlso ?pathway .
 up:citation ?citation .
?protein up:citation ?citation .
}
GROUP BY ?protein

Where I'm trying to query unique instances of each variable, without the full Cartesian Product that SPARQL typically does. I now want a list of all distinct variable matches for each queried variable.

ie., if there are 10 distinct proteins, and 2 distinct annotations, how do I get these results? Do I have to make separate queries?

Jeen Broekstra · Accepted Answer

There are several possible approaches to this.

Use a `CONSTRUCT` query

When selecting loads of different variables, you get a "Cartesian" result because you're representing multiple pattern matches as a tabular structure: each slightly different match gets its own 'row' in the result. A CONSTRUCT query does not return a tabular structure, but returns the subgraph that matches your data. Assuming you are using a library that has some decent support for RDF graph traversal, this might actually be easier and more natural to process than a complex SELECT query.

Use `GROUP_CONCAT`

You can use the GROUP_CONCAT aggregate operator to produce a result where multiple values for a variable are concatenated into a single string. For example, if you previously had this:

  SELECT ?protein ?annotation
   ....

and you got back something like this:

protein1 annotation1
protein1 annotation2
protein2 annotation3
protein2 annotation4
...

you can use this instead:

SELECT ?protein (GROUP_CONCAT(?annotation) as ?annotations)

and your result will look like this:

protein1 "annotation1 annotation2"
protein2 "annotation3 annotation4"

use multiple queries

Another option is to use multiple queries: the first query just retrieves the resource identifiers (the proteins, in your case). Then you iterate over the result and for each resource identifier, do a followup query that gets the additional attributes of interest for that particular resource.

Query separate lists of distinct variables in SPARQL?

Answers (1)

Use a `CONSTRUCT` query

Use `GROUP_CONCAT`

use multiple queries

Related Questions

Query separate lists of distinct variables in SPARQL?

Answers (1)

Use a CONSTRUCT query

Use GROUP_CONCAT

use multiple queries

Related Questions

Use a `CONSTRUCT` query

Use `GROUP_CONCAT`