Stephanie Sapienza
Stephanie Sapienza

Reputation: 31

How to construct SPARQL query for a list of Wikidata items

First off, I'm not a developer, and I'm new to writing SPARQL queries. Mostly I've been looking up existing queries and trying to tweak them to get what I need. The issue is that most documentation on query construction have to do with getting new data you don't have, rather than retrieving or extending existing data. And when you do find tips for retrieving existing data, they tend to be for ONE item at a time instead of a full data set of many items.

I mostly use OpenRefine for this. I start by loading up my existing list of names, and used the Wikidata extension service to reconcile the names to existing Wikidata IDs. So now, this is where I am, vs. where I want to go:

1 - We have a list of Wikidata IDs for reconciled matches;

2 - We have used OpenRefine to get most of the data we need from those;

3 - We don't have the label, description, or Wikipedia links (English), which are extremely valuable;

4 - I have figured out how to construct a query for the label and description of just ONE Wikidata Item:

SELECT ?itemLabel ?itemDescription WHERE {   VALUES ?item {
    wd:Q15485689   }   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
      }

5 - I have figured out how to construct a query to extract the Wikipedia English URL for just ONE Wikidata item:

SELECT ?article ?lang ?name WHERE {
  ?article schema:about wd:Q15485689;
    schema:inLanguage ?lang;
    schema:name ?name;
    schema:isPartOf _:b13.
  _:b13 wikibase:wikiGroup "wikipedia".
  FILTER(?lang IN("en"))
  FILTER(!(CONTAINS(?name, ":")))
  OPTIONAL { ?article wdt:P31 ?instance_of. }
}

The questions are:

*we have 667, but I could do smaller batches if that's too much for the service to handle

Ideally, the query would generate something that allowed me to download a CSV file looking much like this (so I can match on and import the new data into our Airtable base which feeds the website application):

ideal CSV output

If anyone can lead me in the right direction here, I'd appreciate it.

I should also note that if OpenRefine has a way of retrieving these I'm all ears! But since these three don't have a property code, I couldn't see how to snag them from OR.

Upvotes: 3

Views: 1878

Answers (2)

Andrew Lih
Andrew Lih

Reputation: 1

Yes, a VALUES statement in SPARQL can relay not only hundreds but even thousands of items. I regularly do this when cross-checking to see how Wikidata matches up to an existing data set. Some other things you could do as well that take lists of Wikidata items:

Upvotes: 0

tagishsimon
tagishsimon

Reputation: 41

This sort of thing. See how many QIds you can get away with in the values statement. All of them in one go, probably. This query gives you the URL and the article title; clearly, you can snip the article title column if you do not want it. Note also https://www.wikidata.org/wiki/Wikidata:Request_a_query which is wikidata's own location for questions such as these.

SELECT ?item ?itemLabel ?itemDescription ?sitelink ?article
WHERE 
{
  VALUES ?item {wd:Q105848230 wd:Q6697407 wd:Q2344502 wd:Q1698206}
  OPTIONAL {
    ?article schema:about ?item ;
    schema:isPartOf <https://en.wikipedia.org/> ; 
    schema:name ?sitelink .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Upvotes: 4

Related Questions