Atieq ur Rehman
Atieq ur Rehman

Reputation: 75

How to write SPARQL query to fetch counts based on outer subject

I am struggling to write a SPARQL query to fetch a list of products by the owner along with a count of other owners.

following is the query i expect to get the result

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema:<http://schema.org/>
SELECT distinct ?uri ?label ?r ?ownership ?rating ?comments ?allOwners
FROM <http://xxxx.net/>
WHERE  {
  ?r rdf:type <http://schema.org/Relation> . 
  ?r schema:property ?uri.
  ?r schema:owner ?owner .
  ?r schema:ownership ?ownership .
  ?uri rdfs:label ?label .
  OPTIONAL {?r schema:comments ?comments .}
  OPTIONAL {?r schema:rating ?rating .}
  filter (?owner =<http://xxxx.net/resource/37654824-334f-4e57-a40c-4078cac9c579>)

{
    SELECT (count(distinct ?owner) as ?allOwners)
    FROM <http://xxxx.net/>
    where {
      ?relation rdf:type <http://schema.org/Relation> .
      ?relation schema:owner ?owner .
      ?relation schema:property ?uri .
    } group by ?uri
  }
}

but it duplicates the result along with random count values.

How to write such a query, I know the inner query runs before the outer but how to use ?uri (subject) being used in the inner query for each record of outer result?

Upvotes: 0

Views: 228

Answers (1)

RobV
RobV

Reputation: 28675

SPARQL Query semantics specify how portions of the query are joined together. Your sub-query does not project any common variables that are shared with the outer query. It only SELECTs the ?allOwners variable which does not appear in the rest of the query.

This means that you get a cross product of all the counts and all your other results; this is why you get duplicate rows and no correlations between the counts and rows.

This kind of query can be achieved if you structure it correctly. Since you haven't provided example results you desire, I'm having to make a best guess of what you want. Something like the following may have the desired results:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema:<http://schema.org/>

SELECT distinct ?uri ?label ?r ?ownership ?rating ?comments ?allOwners
FROM <http://xxxx.net/>
WHERE  
{
  ?r rdf:type <http://schema.org/Relation> . 
  ?r schema:property ?uri.
  ?r schema:owner ?owner .
  ?r schema:ownership ?ownership .
  ?uri rdfs:label ?label .
  FILTER (?owner = <http://xxxx.net/resource/37654824-334f-4e57-a40c-4078cac9c579>)
  {
    SELECT ?uri (count(distinct ?owner) as ?allOwners)
    FROM <http://xxxx.net/>
    WHERE 
    {
      ?relation rdf:type <http://schema.org/Relation> .
      ?relation schema:owner ?owner .
      ?relation schema:property ?uri .
    } GROUP BY ?uri
  }
  OPTIONAL { ?r schema:comments ?comments . }
  OPTIONAL { ?r schema:rating ?rating . }
}

This differs from your original query as follows:

  • Puts the FILTER on ?owner sooner in the query to help the query engine apply it sooner.
    • FILTER position is usually pretty flexible except when you are using nested graph patterns (like OPTIONAL or MINUS), in which case placing it after those clauses may be applying it later than you intend
    • As a general rule, put your FILTER clauses as soon as possible after all the variables you need are introduced
  • Adds the GROUP BY variable ?uri from your sub-query into the SELECT line of your sub-query
    • This ensures that the query engine can correlate the ?allOwners count with the ?uri to which it pertains
    • This also removes the cross product which should remove the duplicate results and bad correlations

This may or may not be the query you are after, but hopefully it helps point you in the right direction

Upvotes: 1

Related Questions