bcoughlan
bcoughlan

Reputation: 26617

SPARQL counting occurrences in multiple graphs

I'm trying to write a SPARQL query that counts the occurrences of an object in multiple graphs. Sample data and expected output below:

Named graph g1:

@prefix g1: <http://example.com/g1#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
g1:ex1 rdfs:label "a" .
g1:ex2 rdfs:label "a" .
g1:ex3 rdfs:label "b" .
g1:ex3 rdfs:label "d" .

Named graph g2:

@prefix g2: <http://example.com/g2#> .    
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
g2:ex1 rdfs:label "a" .
g2:ex2 rdfs:label "b" .
g2:ex3 rdfs:label "c" .

Expected output of SPARQL query:

?label ?g1count ?g2count
a      2        1
b      1        1
c      0        1
d      1        0

I can get the total count for both graphs by doing a union of the rdfs:labels and counting occurrences:

prefix g1: <http://example.com/g1#>
prefix g2: <http://example.com/g2#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?label count(?label) as ?count
{
  {
    GRAPH g1: {
      ?s rdfs:label ?label
    }
  } UNION {
    GRAPH g2:
    {
      ?s rdfs:label ?label
    }
  }
}

I thought that from here I could use subqueries within each UNION block to get the individual counts, but besides the probable inefficiency of such a query, I have not had any luck getting the expected results.

Upvotes: 5

Views: 1653

Answers (2)

user205512
user205512

Reputation: 8878

Golfing here from RobV's answer (too big for a comment):

prefix g1: <http://example.com/g1#>
prefix g2: <http://example.com/g2#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?label (count(?s1) as ?g1count) (count(?s2) AS ?g2count)
{
  {
    GRAPH g1: {
      ?s1 rdfs:label ?label
    }
  } UNION {
    GRAPH g2: {
      ?s2 rdfs:label ?label
    }
  }
} group by ?label order by ?label

Result:

---------------------
| label | g1c | g2c |
=====================
| "a"   | 2   | 1   |
| "b"   | 1   | 1   |
| "c"   | 0   | 1   |
| "d"   | 1   | 0   |
---------------------

Upvotes: 4

RobV
RobV

Reputation: 28646

You can take advantage of the fact that the COUNT function ignores unbound values and just give your variables different names i.e.

prefix g1: <http://example.com/g1#>
prefix g2: <http://example.com/g2#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT (COALESCE(?label, ?label2) AS ?label) (count(?label1) as ?g1count) (count(?label2) AS ?g2count)
{
  {
    GRAPH g1: {
      ?s rdfs:label ?label1
    }
  } UNION {
    GRAPH g2:
    {
      ?s rdfs:label ?label2
    }
  }
}

The COALESCE function is used to combine the actual value into the labels into a single variable since COALESCE returns the first non-null from the arguments

Upvotes: 2

Related Questions