nambrot
nambrot

Reputation: 2571

Extract Coordinates + Zoomlevel from Wikipedia XML Dump

I am looking to extract the location information of an wikipedia article. It is fairly simple if the article uses the coord template which shows up as a template tag with name Coord or coord.

however, older articles may use a different syntax by which they put the coordinates into the infobox without the coord template. it is easy to extract the coordinates, but more difficult to get the context of the location.

Some articles have streamlined subdivision parameters, some have a coordinates_type parameter, but so far i havent found a good way to determine the zoomlevel for the corresponding map.

Anyone can help?

Upvotes: 0

Views: 265

Answers (2)

Philippe Green
Philippe Green

Reputation: 952

Not sure if the Wikipedia API existed back when this question was first asked. However, currently you can query Wikipedia's API for the coordinates of an article. For example:

http://en.wikipedia.org/w/api.php?action=query&titles=Ann_Arbor,_Michigan&prop=coordinates&format=json

Not sure about map zoom level though...

Upvotes: 1

nambrot
nambrot

Reputation: 2571

So My Solution is as following:

Check for the Coord Template first. It is the most reliable one. Note that you should only check for the template that has display: title.

If you cant find the Coord Template, then stick to the old latd, Lat_d, lat_degrees parameter.

As For the zoomlevel, try to look for the type in the Coord Template or the coordinates_type parameter which can contain dimension, scale and type and population.

If not, you need to parse the zoomLevel from couple other sources. I did population and area parameters, check infoboxes.

Upvotes: 1

Related Questions