Reputation: 11
I have to check the consistency of a dataset regarding stated population; the sum of the population of all administrative units A of level L must be equal to the population of the administrative units B of level L+1 to which all administrative units A belong. This can be only in a single query. I've tried with the following code, but the result was nothing (condition doesn't seem to work). I guess that there is no result because of the "sum" time execution. What can I do to fix this?
prefix ns: <http://geo.linkedopendata.gr/gag/ontology/>
SELECT ?condition WHERE {
?x ns:has_official_name ?decentralized .
?x ns:has_population ?decentralized_population .
?y ns:has_official_name ?region .
?y ns:has_population ?region_population .
?z ns:has_official_name ?regional_unit .
?z ns:has_population ?regional_unit_population .
?z ns:belongs_to ?y .
?y ns:belongs_to ?x .
FILTER regex(str(?decentralized), \"DECENTRALIZED\") .
FILTER regex(str(?region), \"REGION\") .
FILTER regex(str(?regional_unit), \"REGIONAL UNIT\") .
FILTER (!regex(str(?region), \"REGIONAL\")) .
BIND(SUM(?decentralized_population) AS ?sum_decentralized_population) .
BIND(SUM(?region_population) AS ?sum_region_population) .
BIND(SUM(?regional_unit_population) AS ?sum_regional_unit_population) .
OPTIONAL {
BIND( IF(?sum_decentralized_population = ?sum_region_population &&
?sum_region_population = ?sum_regional_unit_population, 'TRUE', 'FALSE')
AS ?condition) .
}
}
GROUP BY ?condition
@Joshua, suppose we have 7 units at level L. These are:
A: with population:100
B: with population:150
C: with population:300
D: with population:200
E: with population:250
F: with population:150
G: with population:150
At level L+1 we have 13 units, where:
unit_1: has population 30 and belongs to A
unit_2: has population 70 and belongs to A
unit_3: has population 80 and belongs to B
unit_4: has population 70 and belongs to B
unit_5: has population 200 and belongs to C
unit_6: has population 100 and belongs to C
unit_7: has population 130 and belongs to D
unit_8: has population 70 and belongs to D
unit_9: has population 180 and belongs to E
unit_10: has population 70 and belongs to E
unit_11: has population 150 and belongs to F
unit_12: has population 100 and belongs to G
unit_13: has population 50 and belongs to G
At level L+2 we have 74 units, where:
unit_a: has population 15 and belongs to unit_1
unit_b: has population 5 and belongs to unit_1
unit_c: has population 3 and belongs to unit_1
unit_d: has population 7 and belongs to unit_1
unit_e: has population 40 and belongs to unit_1
unit_f: has population 20 and belongs to unit_1
unit_g: has population 10 and belongs to unit_1
and so on...(The amount of population is not the real one)
Now, I have to check if the population of unit A (100 at level L) is equal to:
the sum of population of unit_1 and unit_2 (30 +70 at level L+1)
the sum of population of unit_a, unit_b, unit_c, unit_d, unit_e, unit_f and unit_g (15 + 5 + 3 + 7 + 40 + 20 + 10 at level L+2)
.....................
I work with sesame in eclipse and my query is:
String queryStringAndy = "PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#" + "PREFIX gag: http://geo.linkedopendata.gr/gag/ontology/" +
" SELECT DISTINCT ?dec_pop (SUM(?reg_pop) AS ?sum_reg_pop) (SUM(?reg_unit_pop) AS ?sum_reg_unit_pop) (SUM(?mun_pop) AS ?sum_mun_pop)" +
" WHERE { ?x rdf:type gag:Decentralized_Unit ." +
" ?x gag:has_official_name ?decentralized . " +
" ?x gag:has_population ?dec_pop . " +
" ?y rdf:type gag:Region ." +
" ?y gag:has_official_name ?region . " +
" ?y gag:has_population ?reg_pop . " +
" ?z rdf:type gag:Regional_Unit ." +
" ?z gag:has_official_name ?regional_unit . " +
" ?z gag:has_population ?reg_unit_pop . " +
" ?w rdf:type gag:Municipality ." +
" ?w gag:has_official_name ?municipality . " +
" ?w gag:has_population ?mun_pop . " +
" ?y gag:belongs_to ?x ." +
" ?z gag:belongs_to ?y ." +
" ?w gag:belongs_to ?z .}" +
" GROUP BY ?dec_pop " ;
dec_pop: the population of units at level L
reg_pop: the population of units at level L+1
reg_unit_pop: the population of units at level L+2
mun_pop: the population of units at level L+3
The strange thing is that when I have to compare the ?dec_pop with only one sum expression, the result is correct. But when I put more sums, only the sum of the units at the last computing level is equal to ?dec_pop. I hope this is more clear now.
Upvotes: 1
Views: 426
Reputation: 85913
Your query as written isn't legal SPARQL (e.g., you shouldn't escape quotes) and you haven't shown us data, so it's hard to say what's actually going wrong in your query. However, what you're trying to do is achievable in SPARQL. E.g., suppose you've got this data where things can have children, and things can have counts:
@prefix : <urn:ex:>
:a :count 23 ;
:hasChild :b, :c .
:b :count 10 .
:c :count 13 ;
:hasChild :d, :e .
:d :count 6 .
:e :count 6 .
Now, the first thing is to retrieve the data that you want. You can retrieve each node with children, its declared count, and the sum of the count of its children with a query like this:
prefix : <urn:ex:>
select ?x ?count (sum(?_subcount) as ?subcount) where {
?x :count ?count ;
:hasChild/:count ?_subcount
}
group by ?x ?count
-------------------------
| x | count | subcount |
=========================
| :a | 23 | 23 |
| :c | 13 | 12 |
-------------------------
The declared value for :a is right, but :c's is wrong. You can select just the inconsistent values by filtering with having:
prefix : <urn:ex:>
select ?x ?count (sum(?_subcount) as ?subcount) where {
?x :count ?count ;
:hasChild/:count ?_subcount
}
group by ?x ?count
having (?count != ?subcount)
-------------------------
| x | count | subcount |
=========================
| :c | 13 | 12 |
-------------------------
It shouldn't be too hard to update this example to your actual data.
Upvotes: 3