Reputation: 17745
I'm using the wikipedia API to get the infoboxes from certain pages.
An example would be Imperial College London
My problem is the HESA student population|INSTID=0132
value that I'm getting. I was hoping to just get the number for student population but instead I'm getting the id above. How can I get the values of the infoboxes present in a page?
Moreover if you check the wiki page there are two infoboxes (main and rankings). How can I get both of them?
Upvotes: 2
Views: 1746
Reputation: 6269
There's an alternative REST API you could use to access wikipedia content. To get the well-structured HTML for an article you would request:
https://en.wikipedia.org/api/rest_v1/page/html/Imperial_College_London
The HTML is produced by the Parsoid service which produced HTML/RDFa content following the DOM Spec. Inboxes will be html table
element with class `infobox, so you could easily locate all inboxes on the page.
Inboxes are normally created by complex templates, so it might be easier for you to just parse the table HTML.
Upvotes: 2