Reputation: 1171
I am querying a DataCube (RDF statistical data) that has 3 dimension and 1 measure. In this datacube, each observation is then composed of 4 statements (3 for the dimensions and 1 for the measure like the following exemple (that may be queried at http://kaiko.getalp.org/sparql).
SELECT distinct ?version ?lg ?relation ?count WHERE {
?o a qb:Observation;
qb:dataSet dbnstats:dbnaryNymRelationsCube;
dbnary:wiktionaryDumpVersion ?version;
dbnary:observationLanguage ?lg;
dbnary:nymRelation ?relation;
dbnary:count ?count.
}
The query returns something like:
version | lg | relation | count |
---|---|---|---|
"20210601" | "id" | antonym | 4 |
"20210601" | "id" | approximateSynonym | 0 |
"20210601" | "id" | hypernym | 0 |
"20210601" | "id" | synonym | 108 |
"20150602" | "id" | antonym | 2 |
"20150602" | "id" | approximateSynonym | 0 |
"20150602" | "id" | hypernym | 0 |
"20150602" | "id" | synonym | 36 |
"20150702" | "id" | antonym | 2 |
"20150702" | "id" | approximateSynonym | 0 |
"20150702" | "id" | hypernym | 0 |
"20150702" | "id" | synonym | 36 |
I'd like to pivot on the value of the relation to get the following table:
version | lg | antonym | approximateSynonym | hypernym | synonym |
---|---|---|---|---|---|
"20210601" | "id" | 4 | 0 | 0 | 108 |
"20150602" | "id" | 2 | 0 | 0 | 36 |
"20150702" | "id" | 2 | 0 | 0 | 36 |
I could not find a way to craft a single SPARQL query to get this. Currently, I need to fetch all data and make the pivot using whatever client language I use (here python).
Is this possible in SPARQL 1.1 ? How ?
I'd rather have a general answer, but the access point is currently served by Virtuoso.
Edit: To better explain my expectation. In DataCube Vocabulary, the structure of a DataCube is described that gives the different dimensions and measure (usually by the ontology). Hence, the dimensions and measures are considered to be known by the query developper (at least for a specific version of the ontology).
Here, the values of the nymRelation are not known in advance, they are part of the data and not of the structure. The Pivot operation seems to be a valid operation on a DataCube (along with slicing, projecting, etc.).
I would like to know if such an operation could be made on the server (through a generic query that will not depend on the actual data on the server). This would make it possible for a client to maintain a LAZY datacube object and postpone actual pivot operation when the results are indeed necessary.
I suspect (and first answers seems to imply) that this operation is not possible without either fetching the entire DataCube (to perform the operation in memory on client side) or fetching the actual distinct property values and automatically crafting a query that will depend on this first result.
Upvotes: 0
Views: 375
Reputation: 50190
You need to combine values from distinct observations. If hard-coding the relation names in the queries is not too impractical, you can write separate SELECT statements that bind a common value for ?version
and ?lg
to pull the counts into a single solution, like this:
SELECT ?version ?lg ?antonym ?approximateSynonym # ...
WHERE {
{
SELECT ?version ?lg ?antonym
WHERE
{
?o1 a qb:Observation;
qb:dataSet dbnstats:dbnaryNymRelationsCube;
dbnary:wiktionaryDumpVersion ?version;
dbnary:observationLanguage ?lg;
dbnary:nymRelation dbnary:antonym;
dbnary:count ?antonym . # <--- bind the antonym count value
}
}
{
SELECT ?version ?lg ?approximateSynonym
WHERE
{
?o2 a qb:Observation;
qb:dataSet dbnstats:dbnaryNymRelationsCube;
dbnary:wiktionaryDumpVersion ?version;
dbnary:observationLanguage ?lg;
dbnary:nymRelation dbnary:approximateSynonym;
dbnary:count ?approximateSynonym . # <--- bind the approximateSynonym count
}
}
# ... And so on for the other columns
}
This requires that all statistics are present for each version/language combination; otherwise there will be no solution for that combination.
If there are too many relation types, you can use the following CONSTRUCT query to aggregate the equivalent of each row into its own observation-like object. The different properties will be mapped to the same ?rowURI
. You can parse this result as RDF, or just deal with a json serialization if you prefer.
CONSTRUCT {
?rowURI
dbnary:wiktionaryDumpVersion ?version ;
dbnary:observationLanguage ?lg ;
?relation ?count
}
WHERE {
?o a qb:Observation;
qb:dataSet dbnstats:dbnaryNymRelationsCube;
dbnary:wiktionaryDumpVersion ?version;
dbnary:observationLanguage ?lg;
dbnary:nymRelation ?relation;
dbnary:count ?count.
BIND(URI(CONCAT("http://example.org/row/", ?lg, ?version)) AS ?rowURI)
}
Upvotes: 2