Reputation: 201
A very elegant way to compute ranking into SPARQL has beeen suggested by Joshua Taylor in the answer to the following question How to rank values in SPARQL?
However, I need the Aggregation + Ranking as a first step to obtain the cumulative percent in queries such as "Countries that account for top 50% of the sales amount."
Suppose that we have sales at state level
California 30
Arizona 25
...
Alberta 25
Quebec 20
...
I would need to first aggregate by country and obtain the rank of the country as follows
United States 250 1
Canada 200 2
Mexico 150 3
...
How to write Aggregation + Rank in SPARQL generalizing Joshua's answer mentioned above ?
For the next step I would need to do the cumulative sum using the rank as follows
United States 250 1 250
Canada 200 2 450
Mexico 150 3 600
...
For writing this in SPARQL I guess I can use a SQL trick as reported here how to get cumulative sum
If the above succeeded, then we can try the cumulative percentage. Suppose now that the total sales for all countries is 1000. Then the result would be
United States 250 250
Canada 200 450
For writing this in SPARQL we can use something that in SQL would be like follows
SELECT CountryName, SalesAmount
FROM CumSalesCountry
WHERE CumSalesAmount <=
(SELECT MIN(CumSalesAmount) FROM CumSalesCountry
WHERE CumSalesAmount >=
(SELECT 0.5 * SUM(SalesAmount) FROM SalesCountry) ) )
Any help on this would be much appreciated
Esteban (struggling with writing BI-like queries in SPARQL ...)
Upvotes: 1
Views: 761
Reputation: 85823
Suppose you've got data like this:
@prefix : <urn:ex:> .
:a1 a :A ; :v 05 .
:a2 a :A ; :v 10 .
:b1 a :B ; :v 10 .
:b2 a :B ; :v 10 .
:b3 a :B ; :v 05 .
:c1 a :C ; :v 10 .
Then you can use a query like this:
prefix : <urn:ex:>
select
?type
(?value*100/?total as ?percent)
(count(?type2) as ?rank)
(sum(?value2)*100/?total as ?cumulativePercent)
where {
#-- total value across all types
{ select (sum(?value) as ?total)
where { ?x :v ?value } }
#-- each type and its sum value as ?type and ?value
{ select ?type (sum(?v) as ?value)
where { ?x a ?type ; :v ?v }
group by ?type }
#-- each type and its sum value as ?type2 and ?value2
{ select ?type2 (sum(?v) as ?value2)
where { ?x a ?type2 ; :v ?v }
group by ?type2 }
filter ( ?value2 >= ?value )
}
group by ?type ?value ?total
order by desc(?percent)
to get results like these:
---------------------------------------------
| type | percent | rank | cumulativePercent |
=============================================
| :B | 50.0 | 1 | 50.0 |
| :A | 30.0 | 2 | 80.0 |
| :C | 20.0 | 3 | 100.0 |
---------------------------------------------
Now, there is a problem here, if tow of the types have the same percentages. E.g., if you add the data
:d1 a :D ; :v 07 .
:d2 a :D ; :v 08 .
then you get the results:
---------------------------------------------------------------------------
| type | percent | rank | cumulativePercent |
===========================================================================
| :B | 38.461538461538461538461538 | 1 | 38.461538461538461538461538 |
| :A | 23.076923076923076923076923 | 3 | 84.615384615384615384615384 |
| :D | 23.076923076923076923076923 | 3 | 84.615384615384615384615384 |
| :C | 15.384615384615384615384615 | 4 | 99.999999999999999999999999 |
---------------------------------------------------------------------------
That shows that A and D are tied, and gives them the same cumulative percentage and rank. If that's not what you want, you can add something to the filter condition, e.g.,
filter ( ?value2 > ?value
|| ( ?value2 = ?value &&
str(?type2) >= str(?type) ))
to resolve the identical cases and get results like:
---------------------------------------------------------------------------
| type | percent | rank | cumulativePercent |
===========================================================================
| :B | 38.461538461538461538461538 | 1 | 38.461538461538461538461538 |
| :D | 23.076923076923076923076923 | 2 | 61.538461538461538461538461 |
| :A | 23.076923076923076923076923 | 3 | 84.615384615384615384615384 |
| :C | 15.384615384615384615384615 | 4 | 99.999999999999999999999999 |
---------------------------------------------------------------------------
Upvotes: 3
Reputation: 16630
SPARQL support subqueries which can be used for calculation of aggregates as part of an overall query:
SELECT * {
{ SELECT (MIN(?CumSalesAmount) AS ?minSalesAmount)
{ ... get ?CumSalesAmount ... } }
{ SELECT (0.5*sum(?SalesAmount) AS ?aggSalesAmount)
{ ... get ?SalesAmount ... } }
... get ?CountryName ?SalesAmount ...
}
The effects are not identical - it's a way of approaching the problem, not a translation clone of your example.
Upvotes: 1