Thomas Kekeisen
Thomas Kekeisen

Reputation: 4406

How to get missing information for a city from wikipedia

For a project, I download some Wikipedia city pages like the page for my hometown Markdorf. As you can see on the Wikipedia page, the area of the city is displayed next to "Fläche" and the population is displayed next to "Einwohner".

Screenshot of the Wikipedia page of Markdorf

How can I get this data from the api? When I download the json version of the Wikipedia page of Markdorf the response of course contains "Fläche" and "Einwohner" but without value next to it. I expected to get this value like "Landkreis", that is returned as "key value pair" in the json version of the Wikipedia page: Landkreis = Bodenseekreis.

Fläche is listed as Fläche<ref name="Daten & Fakten">[http://markdorf.de/index.php?id=351 ''Daten & Fakten''] auf der Internetseite der Stadt Markdorf, abgerufen am 29.&nbsp;Mai 2015.</ref> | without any data. The referenced website http://markdorf.de/index.php?id=351 of course contains the information, but in no parsable way.

Screenshot of the Wikipedia api response for Markdorf

So: How can I access the information like Fläche and Einwohner using the Wikipedia api? Also Bevölkerungsdichte is not returned at all.

Upvotes: 0

Views: 319

Answers (1)

Jan Drewniak
Jan Drewniak

Reputation: 1384

Tgr is right, you should use a structured data source instead of trying to parse the wikitext directly. You could use the wikidata query service to build a SPARQL query that returns the area and population based on the name of the town. That query might look like this:

SELECT ?town ?townLabel ?area ?population WHERE {
  ?town ?label "Markdorf"@de.     # find the item labeled "Markdorf" in German
  ?town wdt:P2046 ?area.          # get the area(wdt:P2046) of that item
  ?town wdt:P1082 ?population.    # get the population(wdt:P1082) of that item
  SERVICE wikibase:label { bd:serviceParam wikibase:language "
[AUTO_LANGUAGE],de". }
}

Link to the query above

The results of that query can be accessed via the Wikidata JSON endpoint (the query is just encoded as the query parameter in that URL).

Upvotes: 1

Related Questions