crzy
crzy

Reputation: 33

Duplicated items on SPARQL query (Wikidata)

I'm trying to clean some results from a WikiData query. If you lookup for IBM, for example, you'll see multiple entries of it... I'd like to show only the first result of a same "wd:" item.

Is there a way to user FILTER or EXISTS on this case? Like, if there were a ?item result found, move on.. etc? How would one deal with this example in the SPARQL sintax?

I've tried to do it with "GROUP BY", as I've seen some people mentioning it, but it didn't work.

SELECT DISTINCT (SAMPLE (?item) AS ?item) ?itemLabel ?website ?countryLabel ?industryLabel ?headquartersLabel
WHERE  {
  
    ?item wdt:P452 ?industry ;
          wdt:P17  ?country .
          FILTER((?industry = wd:Q11661)  || 
                 (?industry = wd:Q11016)  ||
                 (?industry = wd:Q880371) ||
                 (?industry = wd:Q3966)   ||
                 (?industry = wd:Q1481411)||
                 (?industry = wd:Q1540863)||
                 (?industry = wd:Q638608))

    OPTIONAL{ ?item wdt:P856 ?website . }    # gets website
    OPTIONAL{ ?item wdt:P159 ?headquarters . } 
    SERVICE wikibase:label {
       bd:serviceParam wikibase:language "en"
    }

} GROUP BY ?item ?itemLabel ?website ?countryLabel ?industryLabel ?headquartersLabel

I've also tried to use a nested select, it works, but it doesn't return the rest of the table.

SELECT ?item ?itemLabel ?website ?country ?countryLabel ?industry ?industryLabel
WHERE  {
    SELECT DISTINCT ?item WHERE {
      
    ?item wdt:P452 ?industry ;
          wdt:P17  ?country .
          FILTER((?industry = wd:Q11661)  || 
                 (?industry = wd:Q11016)  ||
                 (?industry = wd:Q880371) ||
                 (?industry = wd:Q3966)   ||
                 (?industry = wd:Q1481411)||
                 (?industry = wd:Q1540863)||
                 (?industry = wd:Q638608))

    OPTIONAL{ ?item wdt:P856 ?website . }    # gets website
    
    SERVICE wikibase:label {
       bd:serviceParam wikibase:language "[AUTO_LANGUAGE],fr,ar,be,bg,bn,ca,cs,da,de,el,en,es,et,fa,fi,he,hi,hu,hy,id,it,ja,jv,ko,nb,nl,eo,pa,pl,pt,ro,ru,sh,sk,sr,sv,sw,te,th,tr,uk,yue,vec,vi,zh"
    }}
} 
ORDER BY ?item

Upvotes: 1

Views: 426

Answers (1)

Valerio Cocchi
Valerio Cocchi

Reputation: 1966

The problem with your initial approach is that if the combination of ?item ?itemLabel ?website ?countryLabel ?industryLabel ?headquartersLabel is different, then a new line will be returned. E.g.

| wd:Q123 | Company1 | co1.com   | Tech   |
| wd:Q123 | Company1 | co1.co.uk | Tech   |
| wd:Q123 | Company1 | co1.com   | Pharma |
| wd:Q123 | Company1 | co1.co.uk | Pharma |

You can do two things: 1-Return a concatenation of all the industries, websites etc, but this times out. It would return something like this.

| wd:Q123 | Company1 | co1.com , co1.co.uk | Tech , Pharma |

2-Return a sample of each industry, website, etc., which could be.

| wd:Q123 | Company1 | co1.com | Pharma |

Of course, you may have Company2 which shares one or more but not all industries with Company1, but because you use a sample, you may see that they are in a different industry. This latest approach seems to work for me:

SELECT ?item ?itemLabel ?industryLabel ?countryLabel ?websiteLabel ?hqLabel
WHERE{
{SELECT ?item ?itemLabel
(SAMPLE(?industry) AS ?industry) (SAMPLE(?country) AS ?country)
(SAMPLE(?website) AS ?website) (SAMPLE(?hq) AS ?hq)
WHERE {
  ?item wdt:P452 ?industry ;
        wdt:P17  ?country .
  
OPTIONAL{ ?item wdt:P856 ?website . }    # gets website
OPTIONAL{ ?item wdt:P159 ?hq . }

  {SELECT DISTINCT ?item ?itemLabel
WHERE  {
?item wdt:P452 ?industry .
          VALUES ?industry { wd:Q11661 
                             wd:Q11016
                             wd:Q880371
                             wd:Q3966
                             wd:Q1481411
                             wd:Q1540863
                             wd:Q638608 }
                             
    SERVICE wikibase:label {
       bd:serviceParam wikibase:language "en"
    }
   }
  }
 } GROUP BY ?item ?itemLabel}
SERVICE wikibase:label {
       bd:serviceParam wikibase:language "en"
    }
}
ORDER BY ?itemLabel

Upvotes: 4

Related Questions