Reputation: 28968
Starting with the example shown in https://stackoverflow.com/a/57856942/841830 I then grouped them to only have one row per country. (It is 208 results in 21600ms; the original was 53,546 results in 19484ms, so it hasn't slowed it down much.)
SELECT ?country (GROUP_CONCAT(?label; SEPARATOR=";") AS ?labels) (GROUP_CONCAT(lang(?label); SEPARATOR=";") as ?label_langs) ?countryLabel
WHERE
{
?country wdt:P31 wd:Q6256.
?country rdfs:label ?label .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?country ?countryLabel
Note that I can't use DISTINCT
because labels would get merged, but language codes would not, so they would no longer be in a 1-1 relationship.
As an example, here is a cut-down version (the first 16 labels) of what I get for Japan (Q17):
日本;Japan;Japó;ژاپن;Japan;Japan;Giappone;ಜಪಾನ್;Japon;Япония;Japan;Hapon;Hapon;Japonia;Japón;Japan;
ja;nl;ca;fa;en;de;it;kn;fr;ru;en-gb;ilo;tl;ro;es;din
It'd be nice to append the language code directly to each label, so I only have a single string. So it would look like:
日本@ja;Japan@nl;Japó@ca;ژاپن@fa;Japan@en;Japan@de;Giappone@it;ಜಪಾನ್@kn;Japon@fr;Япония@ry;Japan@en-gb;Hapon@ilo;Hapon@tl;Japonia@ro;Japón@es;Japan@din;
But what I'd really like is to merge duplicates in two ways:
So like this:
日本@ja;Japó@ca;ژاپن@fa;Giappone@it;ಜಪಾನ್@kn;Japon@fr;Япония@ry;Hapon@ilo,tl;Japonia@ro;Japón@es;
(All instances of "Japan" removed, and ilo
and tl
combined.)
I could do all this in a post-processing script, after exporting, of course. But I'm wondering how far we can go in SPARQL, and without increasing the query time?
Aside: if I change ?country rdfs:label ?label .
into ?country skos:altLabel ?label .
it returns 203 results in 7151ms. The original linked-to query reduces to 20,258 results in 7776ms. Is it running quicker in both cases simply as the total number of strings is 40% of the original? Or is skos:
fundamentally quicker than rdfs:
?
Upvotes: 2
Views: 43
Reputation: 28968
(Using UninformedUser's comment as a starting point.)
First, to get the single column, combining country name and language code together it as simple as:
(GROUP_CONCAT(CONCAT(?label, "@", lang(?label)); SEPARATOR=";") AS ?labels)
BTW, nice tip when experimenting with a query is to put BIND(wd:Q17 AS ?country)
as the first line in the WHERE{...}
to have it run much quicker by just fetching for one country, Japan in this case.
To pull the English labels out of ?labels
we replace the SERVICE wikibase:label
line with ?country rdfs:label ?en_label
followed by a filter on the desired language.
SELECT ?country ?en_label (GROUP_CONCAT(CONCAT(?label, "@", lang(?label)); SEPARATOR=";") AS ?labels)
WHERE
{
?country wdt:P31 wd:Q6256.
?country rdfs:label ?label .
?country rdfs:label ?en_label . FILTER(lang(?en_label) = "en")
# FILTER(str(?label) != str(?en_label))
}
GROUP BY ?country ?en_label
Then to exclude the countries using the English label for the country name, uncomment the FILTER(str(?label) != str(?en_label))
line.
In my test it took 56 secs before, and a time-out with the filter. When testing on one country, adding the filter seemed to double the query time, so that makes sense.
UninformedUser's query runs in 2.6 secs and seems to give the same results:
SELECT ?country ?en_label (GROUP_CONCAT(CONCAT(?label, "@", lang(?label)); SEPARATOR=";") as ?other_labels)
WHERE {
{
SELECT * {
# BIND(wd:Q17 AS ?country)
?country wdt:P31 wd:Q6256.
?country rdfs:label ?en_label .
FILTER(lang(?en_label) = "en")
hint:SubQuery hint:runOnce true .
}
}
hint:Prior hint:runFirst true .
?country rdfs:label ?label .
FILTER(str(?label) != str(?en_label))
}
GROUP BY ?country ?en_label
I assume the nested SELECT * {...}
is being used to be able to give the two hint
. At this stage I don't know if it is nested SELECT, the hint:SubQuery
or the hint:Prior
that makes such a big difference in run-time.
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/query_optimization maybe explains it.
Given the lack of answer for it so far, there seems no way to concat the country codes, when the label is the same. E.g. for Japan, "Japon" is used for French and 15 others, "Япония" is used for Russian, and 9 other languages.
Upvotes: 0