Reputation: 4406
For a project, I download some Wikipedia city pages like the page for my hometown Markdorf. As you can see on the Wikipedia page, the area of the city is displayed next to "Fläche" and the population is displayed next to "Einwohner".
How can I get this data from the api? When I download the json version of the Wikipedia page of Markdorf the response of course contains "Fläche" and "Einwohner" but without value next to it. I expected to get this value like "Landkreis", that is returned as "key value pair" in the json version of the Wikipedia page: Landkreis = Bodenseekreis
.
Fläche
is listed as Fläche<ref name="Daten & Fakten">[http://markdorf.de/index.php?id=351 ''Daten & Fakten''] auf der Internetseite der Stadt Markdorf, abgerufen am 29. Mai 2015.</ref> |
without any data. The referenced website http://markdorf.de/index.php?id=351 of course contains the information, but in no parsable way.
So: How can I access the information like Fläche
and Einwohner
using the Wikipedia api? Also Bevölkerungsdichte
is not returned at all.
Upvotes: 0
Views: 319
Reputation: 1384
Tgr is right, you should use a structured data source instead of trying to parse the wikitext directly. You could use the wikidata query service to build a SPARQL query that returns the area and population based on the name of the town. That query might look like this:
SELECT ?town ?townLabel ?area ?population WHERE {
?town ?label "Markdorf"@de. # find the item labeled "Markdorf" in German
?town wdt:P2046 ?area. # get the area(wdt:P2046) of that item
?town wdt:P1082 ?population. # get the population(wdt:P1082) of that item
SERVICE wikibase:label { bd:serviceParam wikibase:language "
[AUTO_LANGUAGE],de". }
}
The results of that query can be accessed via the Wikidata JSON endpoint (the query is just encoded as the query
parameter in that URL).
Upvotes: 1