Reputation: 5
Say I have a query like this:
WHERE {
<http://purl.uniprot.org/uniprot/Q8NAT1> up:classifiedWith ?annotation .
?protein up:classifiedWith ?annotation .
<http://purl.uniprot.org/uniprot/Q8NAT1> up:annotation ?O3OET.
?O3OET a up:Topological_Domain_Annotation;
rdfs:comment ?topology;
up:range ?Q02UJ .
?protein a up:Protein .
?protein up:annotation ?otherTop .
?otherTop a up:Topological_Domain_Annotation;
rdfs:comment ?topology;
up:range ?OTHERRANGE .
<http://purl.uniprot.org/uniprot/Q8NAT1> up:annotation ?S7IK0.
?S7IK0 a up:Pathway_Annotation ;
rdfs:seeAlso ?pathway .
?protein a up:Protein .
?protein up:annotation ?VAR2 .
?VAR2 a up:Pathway_Annotation ;
rdfs:seeAlso ?pathway .
<http://purl.uniprot.org/uniprot/Q8NAT1> up:citation ?citation .
?protein up:citation ?citation .
}
GROUP BY ?protein
Where I'm trying to query unique instances of each variable, without the full Cartesian Product that SPARQL typically does. I now want a list of all distinct variable matches for each queried variable.
ie., if there are 10 distinct proteins, and 2 distinct annotations, how do I get these results? Do I have to make separate queries?
Upvotes: 0
Views: 100
Reputation: 22042
There are several possible approaches to this.
CONSTRUCT
queryWhen selecting loads of different variables, you get a "Cartesian" result because you're representing multiple pattern matches as a tabular structure: each slightly different match gets its own 'row' in the result. A CONSTRUCT query does not return a tabular structure, but returns the subgraph that matches your data. Assuming you are using a library that has some decent support for RDF graph traversal, this might actually be easier and more natural to process than a complex SELECT query.
GROUP_CONCAT
You can use the GROUP_CONCAT
aggregate operator to produce a result where multiple values for a variable are concatenated into a single string. For example, if you previously had this:
SELECT ?protein ?annotation
....
and you got back something like this:
protein1 annotation1
protein1 annotation2
protein2 annotation3
protein2 annotation4
...
you can use this instead:
SELECT ?protein (GROUP_CONCAT(?annotation) as ?annotations)
and your result will look like this:
protein1 "annotation1 annotation2"
protein2 "annotation3 annotation4"
Another option is to use multiple queries: the first query just retrieves the resource identifiers (the proteins, in your case). Then you iterate over the result and for each resource identifier, do a followup query that gets the additional attributes of interest for that particular resource.
Upvotes: 1