Reputation: 83
I have ingested the Geonames RDF dump (https://download.geonames.org/all-geonames-rdf.zip) into a Virtuoso instance, and I've been running queries against it with varying degrees of success. However, I've found that certain objects have the incorrect datatype. For example, population is encoded using xsd:string, and therefore trying to sort by population ends up sorting the results in lexicographic order:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX gn: <http://www.geonames.org/ontology#>
SELECT ?country ?name ?population (datatype(?population) AS ?type)
WHERE {
?country a gn:Feature .
?country gn:name ?name .
# A.PCLI is feature code for 'independent political entity'
?country gn:featureCode <https://www.geonames.org/ontology#A.PCLI> .
?country gn:population ?population .
}
ORDER BY DESC(?population)
LIMIT 10
country | name | population | type |
---|---|---|---|
https://sws.geonames.org/1814991/ | China | 1330044000 | http://www.w3.org/2001/XMLSchema#string |
https://sws.geonames.org/1269750/ | India | 1173108018 | http://www.w3.org/2001/XMLSchema#string |
https://sws.geonames.org/6252001/ | United States | 310232863 | http://www.w3.org/2001/XMLSchema#string |
https://sws.geonames.org/1643084/ | Indonesia | 242968342 | http://www.w3.org/2001/XMLSchema#string |
https://sws.geonames.org/3469034/ | Brazil | 201103330 | http://www.w3.org/2001/XMLSchema#string |
I know I can cast the variable to get the correct result like so ORDER BY DESC(xsd:integer(?population))
, but once my queries get more complicated, this no longer works. Specifically, when running sub queries and using the results to apply further logic. For example:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX gn: <http://www.geonames.org/ontology#>
SELECT ?cityName ?countryName ?population datatype(?population)
WHERE
{
?city gn:parentCountry ?country ;
gn:population ?population ;
gn:name ?cityName .
?country gn:name ?countryName .
{
# a) SELECT ?country (MAX(?population) AS ?population)
# b) SELECT ?country (MAX(xsd:integer(?population)) AS ?population)
# c) SELECT ?country (xsd:string(MAX(xsd:integer(?population))) AS ?population)
WHERE
{
?city a gn:Feature ;
gn:featureClass <https://www.geonames.org/ontology#P> ;
gn:population ?population ;
gn:parentCountry ?country .
}
GROUP BY ?country
ORDER BY DESC(?population)
}
}
Select a
returns the populations in lexicographic order, as before.
Select b
correctly orders the populations, but seeing as the result set has cast the population to integers, I can no longer match the city using population outside the sub query as I'm comparing strings with integers. So b
returns an empty result set.
Select c
was my attempt at recasting the results back to strings in order to be able to match them outside the sub query, but this ends in a timeout (estimated 4000 second execution time).
My question is this: Is there a way to either
a) change the datatype in Virtuoso manually
b) use the Geonames ontology to instruct Virtuoso about the correct types
c) alter my query to more efficiently cast to the correct type
I'm hoping option b is possible, as this seems the most effective solution, because the Geonames ontology correctly specifies the types to all of the resulting predicate's objects.
You can find the Geoname ontology here.
You can test the queries above and your own against our endpoint here:
http://18.170.45.162:8890/sparql
Upvotes: 0
Views: 159
Reputation: 9434
Another option, based on your query (a) --
Put your CAST into the SELECT, as --
SELECT ?cityName
?countryName
?population
(xsd:integer(?population) AS ?pop)
(datatype(?population) AS ?PopDataType)
and then ORDER BY DESC (4)
(for the 4th variable in your select). This retains the data as inserted, which may be valuable in some scenarios.
Here's the full query --
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX gn: <http://www.geonames.org/ontology#>
SELECT ?cityName
?countryName
?population
(xsd:integer(?population) AS ?pop)
(datatype(?population) AS ?PopDataType)
WHERE
{
?city gn:parentCountry ?country ;
gn:population ?population ;
gn:name ?cityName .
?country gn:name ?countryName .
{
# a)
SELECT ?country (MAX(?population) AS ?population)
# b) SELECT ?country (MAX(xsd:integer(?population)) AS ?population)
# c) SELECT ?country (xsd:string(MAX(xsd:integer(?population))) AS ?population)
WHERE
{
?city a gn:Feature ;
gn:featureClass <https://www.geonames.org/ontology#P> ;
gn:population ?population ;
gn:parentCountry ?country .
}
GROUP BY ?country
}
}
ORDER BY DESC (4)
-- and results.
You might also consider upgrading your Virtuoso from Release 6.1 (06.01.3127) which hasn't been updated since roughly Feb, 2010 (though your binary was apparently compiled in 2019) to a current build of Release 7 from the much more recent codebase. This is vital if you intend to perform GeoSPARQL queries, as this support was not added until Release 7.1, circa 2018!
UPDATED TO ADD:
To get full GeoSPARQL functionality from Virtuoso, it's necessary to follow the "Virtuoso GeoSPARQL support" guidance on the Open Source GitHub Project and/or the Virtuoso Geospatial Enhancements section of the Virtuoso Product Manual. The "Virtuoso GeoSPARQL Demo Server" article in the Community Forum can also be helpful, as can testing on the GeoSPARQL Demo instance.
Upvotes: 0