Reputation: 133
I wanna use Sheets to query pages from wikidata and scrape a specific section but I couldn't find anything focused specifically on this, and since I'm a beginner in this I don't know really where to start from. So, I have a list of Q identifiers and I'd like to use them to query the page, and then check if there's a specific section there (or scrape the data from it if possible) otherwise return false. I started with what I found here
=ImportXml(concat("https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&
ppprop=wikibase_item&redirects=1&format=xml&titles=",G1),"//@wikibase_item")
but used instead the link for wiki data (it generates a working link, but I'm not sure if this works as the equivalent of an API and if I can further query it to get data) and the wiki Property code that I wanna get (date of death, /wiki/Property:P570), but I get the "Imported content is empty." error, for this link. Ideally, I would've like to get the date of death value (20 November 2014), or at least a TRUE , meaning that the section exists, and the person is dead.
=IMPORTXML(CONCAT("https://www.wikidata.org/wiki/",A2),"/wiki/Property:P570")
So I might have some Q-links that don't have this section/property at all, and for which I should get an error, but I don't know why it's not working for this one either, do I have to make Xpath to div, or can I use the wiki Property?
I hope this makes sense, I'll put the sample sheet here. Thanks
Upvotes: 2
Views: 225
Reputation: 1
try:
=QUERY(IMPORTXML("https://www.wikidata.org/wiki/"&A1, "//*"),
"select Col2 where Col1 = 'date of death' and Col2 is not null")
or:
=QUERY(IMPORTXML("https://www.wikidata.org/wiki/"&A1, "//*"),
"select Col2 where Col1 = 'date of death' and Col2 is not null")<>""
and for no match:
=IFERROR(QUERY(IMPORTXML("https://www.wikidata.org/wiki/"&A1, "//*"),
"select Col2 where Col1 = 'date of death' and Col2 is not null"), FALSE)
=IFERROR(REGEXEXTRACT(QUERY(IMPORTXML("wikidata.org/wiki/"&A2, "//*"),
"select Col2 where Col1 = 'date of birth' and Col2 is not null"),
"(.*) \d.*reference.*"), FALSE)
Upvotes: 4