Adrian Gschwend
Adrian Gschwend

Reputation: 674

SPARQL counting triples in multiple graphs

I want to count triples in multiple graphs which are of a certain class and sum it up. I manage to count the triples of this class in each graph but I don't manage to calculate the total.

Initial query:

PREFIX locah: <http://data.archiveshub.ac.uk/def/>
PREFIX bs: <http://localhost:3030/alod/bs>
PREFIX ne: <http://localhost:3030/alod/ne>

SELECT (count(?sBS) as ?BScount) (count(?sNE) AS ?NEcount) WHERE {
  {
    GRAPH bs: {
      ?sBS a locah:ArchivalResource
    }
  } UNION {
    GRAPH ne: {
      ?sNE a locah:ArchivalResource
    }
  }
}    

My idea was to simply use the SUM() function as well so the SELECT would be like this:

SELECT (count(?sBS) as ?BScount) (count(?sNE) AS ?NEcount) (SUM(?NEcount ?BScount) AS ?total )WHERE {

But that doesn't seem to work.

And a related question: Why do I need the UNION? If I execute it without UNION it seems to come up with a very high triple count which doesn't make much sense and the count is twice the same.

You can try it on my SPARQL endpoint: http://data.alod.ch/sparql/

Upvotes: 0

Views: 2887

Answers (1)

Joshua Taylor
Joshua Taylor

Reputation: 85843

When you use an aggregate in the projection, you have to partition, or group, the solutions by distinct values of some of the variables. If you don't specify a group by clause, then the grouping is implicit. In this case, you have (at least) two options. One would be to use two subqueries, as in:

select ?acount ?bcount (?acount + ?bcount as ?totalCount) where {
  { select (count(*) as ?acount) where {
      graph :a { ... } }
  { select (count(*) as ?bcount) where {
      graph :b { ... } }
}

I think that's probably the simplest and most self-explanatory option. The other option, as you've noted, is to use a union:

select (count(?a) as ?acount)
       (count(?b) as ?bcount)
       (?acount + ?bcount as ?totalCount)
where {
  { graph :a { ?a ... } }
  union
  { graph :b { ?b ... } }
}

The reason that something like

select (count(?a) as ?acount)
       (count(?b) as ?bcount)
       (?acount + ?bcount as ?totalCount)
where {
  graph :a { ?a ... }
  graph :b { ?b ... }
}

doesn't work is that you end up with the Cartesian product of ?a and ?b values. That is, suppose that there are two values for ?a and three values for ?b. Then you end up with six rows in the table:

a1, b1  
a1, b2 
a1, b3
a2, b1  
a2, b2 
a2, b3

Each of these rows is unique, so if you use the implicit group by, you'll have six a's and six b's, which isn't really what you want. You could still do this however, using distinct:

select (count(distinct ?a) as ?acount)
       (count(distinct ?b) as ?bcount)
       (?acount + ?bcount as ?totalCount)
where {
  #-- ...
}

Types of Queries

Type 1: Subqueries

SELECT ?BScount ?NEcount (?BScount + ?NEcount as ?totalCount)
WHERE {
  { select (count(*) as ?BScount) WHERE {
      GRAPH bs: { ?sBS a locah:ArchivalResource }
    } }
  { select (count(*) as ?NEcount) WHERE {
      GRAPH ne: { ?sNE a locah:ArchivalResource }
    } }
}

Type 2: Union

SELECT (count(?sBS) as ?BScount)
       (count(?sNE) AS ?NEcount)
       (?BScount + ?NEcount as ?totalCount)
WHERE {
  { GRAPH bs: { ?sBS a locah:ArchivalResource } }
  UNION
  { GRAPH ne: { ?sNE a locah:ArchivalResource } }
}

Type 3: Cartesian Product and Distinct

SELECT (count(distinct ?sBS) as ?BScount)
       (count(distinct ?sNE) AS ?NEcount)
       (?BScount + ?NEcount as ?totalCount)
WHERE {
  { GRAPH bs: { ?sBS a locah:ArchivalResource } }
  { GRAPH ne: { ?sNE a locah:ArchivalResource } }
}

Upvotes: 4

Related Questions