Alkis Kalogeris
Alkis Kalogeris

Reputation: 17745

Wikipedia API infobox

I'm using the wikipedia API to get the infoboxes from certain pages. An example would be Imperial College London My problem is the HESA student population|INSTID=0132 value that I'm getting. I was hoping to just get the number for student population but instead I'm getting the id above. How can I get the values of the infoboxes present in a page?

Moreover if you check the wiki page there are two infoboxes (main and rankings). How can I get both of them?

Upvotes: 2

Views: 1746

Answers (1)

Petr
Petr

Reputation: 6269

There's an alternative REST API you could use to access wikipedia content. To get the well-structured HTML for an article you would request:

https://en.wikipedia.org/api/rest_v1/page/html/Imperial_College_London

The HTML is produced by the Parsoid service which produced HTML/RDFa content following the DOM Spec. Inboxes will be html table element with class `infobox, so you could easily locate all inboxes on the page.

Inboxes are normally created by complex templates, so it might be easier for you to just parse the table HTML.

Upvotes: 2

Related Questions