Reputation: 33
I'm trying to clean some results from a WikiData query. If you lookup for IBM, for example, you'll see multiple entries of it... I'd like to show only the first result of a same "wd:" item.
Is there a way to user FILTER or EXISTS on this case? Like, if there were a ?item result found, move on.. etc? How would one deal with this example in the SPARQL sintax?
I've tried to do it with "GROUP BY", as I've seen some people mentioning it, but it didn't work.
SELECT DISTINCT (SAMPLE (?item) AS ?item) ?itemLabel ?website ?countryLabel ?industryLabel ?headquartersLabel
WHERE {
?item wdt:P452 ?industry ;
wdt:P17 ?country .
FILTER((?industry = wd:Q11661) ||
(?industry = wd:Q11016) ||
(?industry = wd:Q880371) ||
(?industry = wd:Q3966) ||
(?industry = wd:Q1481411)||
(?industry = wd:Q1540863)||
(?industry = wd:Q638608))
OPTIONAL{ ?item wdt:P856 ?website . } # gets website
OPTIONAL{ ?item wdt:P159 ?headquarters . }
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en"
}
} GROUP BY ?item ?itemLabel ?website ?countryLabel ?industryLabel ?headquartersLabel
I've also tried to use a nested select, it works, but it doesn't return the rest of the table.
SELECT ?item ?itemLabel ?website ?country ?countryLabel ?industry ?industryLabel
WHERE {
SELECT DISTINCT ?item WHERE {
?item wdt:P452 ?industry ;
wdt:P17 ?country .
FILTER((?industry = wd:Q11661) ||
(?industry = wd:Q11016) ||
(?industry = wd:Q880371) ||
(?industry = wd:Q3966) ||
(?industry = wd:Q1481411)||
(?industry = wd:Q1540863)||
(?industry = wd:Q638608))
OPTIONAL{ ?item wdt:P856 ?website . } # gets website
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],fr,ar,be,bg,bn,ca,cs,da,de,el,en,es,et,fa,fi,he,hi,hu,hy,id,it,ja,jv,ko,nb,nl,eo,pa,pl,pt,ro,ru,sh,sk,sr,sv,sw,te,th,tr,uk,yue,vec,vi,zh"
}}
}
ORDER BY ?item
Upvotes: 1
Views: 426
Reputation: 1966
The problem with your initial approach is that if the combination of ?item ?itemLabel ?website ?countryLabel ?industryLabel ?headquartersLabel is different, then a new line will be returned. E.g.
| wd:Q123 | Company1 | co1.com | Tech |
| wd:Q123 | Company1 | co1.co.uk | Tech |
| wd:Q123 | Company1 | co1.com | Pharma |
| wd:Q123 | Company1 | co1.co.uk | Pharma |
You can do two things: 1-Return a concatenation of all the industries, websites etc, but this times out. It would return something like this.
| wd:Q123 | Company1 | co1.com , co1.co.uk | Tech , Pharma |
2-Return a sample of each industry, website, etc., which could be.
| wd:Q123 | Company1 | co1.com | Pharma |
Of course, you may have Company2 which shares one or more but not all industries with Company1, but because you use a sample, you may see that they are in a different industry. This latest approach seems to work for me:
SELECT ?item ?itemLabel ?industryLabel ?countryLabel ?websiteLabel ?hqLabel
WHERE{
{SELECT ?item ?itemLabel
(SAMPLE(?industry) AS ?industry) (SAMPLE(?country) AS ?country)
(SAMPLE(?website) AS ?website) (SAMPLE(?hq) AS ?hq)
WHERE {
?item wdt:P452 ?industry ;
wdt:P17 ?country .
OPTIONAL{ ?item wdt:P856 ?website . } # gets website
OPTIONAL{ ?item wdt:P159 ?hq . }
{SELECT DISTINCT ?item ?itemLabel
WHERE {
?item wdt:P452 ?industry .
VALUES ?industry { wd:Q11661
wd:Q11016
wd:Q880371
wd:Q3966
wd:Q1481411
wd:Q1540863
wd:Q638608 }
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en"
}
}
}
} GROUP BY ?item ?itemLabel}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en"
}
}
ORDER BY ?itemLabel
Upvotes: 4