Jon295087
Jon295087

Reputation: 741

Return specific data from a Wikipedia Page using API

I want to parse Geographic pages (i.e. landmarks, places of interest) on Wikipedia to return a json file that only contains only the page title, and the GIS coordinates scraped from the page(s).

So for example, looking at the page: https://en.wikipedia.org/wiki/The_Sanctuary

Using the api: https://en.wikipedia.org/w/api.php?action=query&titles=The%20Sanctuary&prop=revisions&rvprop=content&format=json returns all the data from the page content.

However, I just want to return the following elements:

"title":"The Sanctuary" coord|51.41000|N|1.83173|W

Please can anyone advise how to correctly structure the web service call?

This is a first attempt at scraping content from pages for me, so any guidance greatly appreciated

Upvotes: 0

Views: 84

Answers (1)

Tgr
Tgr

Reputation: 28160

Rule of thumb for scraping is to not do it. Many things are available in the API (use the API sandbox to discover them). For most other interesting data someone already wrote a library.

In this case, action=query&titles=The_Sanctuary&prop=coordinates will get you what you want:

{
    "batchcomplete": "",
    "query": {
        "pages": {
            "788970": {
                "pageid": 788970,
                "ns": 0,
                "title": "The Sanctuary",
                "coordinates": [
                    {
                        "lat": 51.41,
                        "lon": -1.83173,
                        "primary": "",
                        "globe": "earth"
                    }
                ]
            }
        }
    }
}

Upvotes: 1

Related Questions