Reputation: 73
I am trying to search by section with the Wikipedia API.
What I already know: For the below: https://en.wikipedia.org/w/api.php?format=xml&action=query&prop=revisions&titles=Game_of_Thrones_(season_1)&rvprop=content&rvsection=0
I know the rvsection=0 will give me section 0 of the Wikipedia page and I can change this to get different sections of the page eg. 1,2,3.
What I am wondering is how/if I can search via section name? Eg. In the link above on the Wikipedia page there is a section named "Episodes", how can I search for this, so I get all the content from this section.
If this is not possible, is there a work around for this? What I am wanting to do is get Episode information from different Wikipedia pages.
Upvotes: 1
Views: 819
Reputation: 73
I have done some more researching into this and have sort of found a solution.
If we want to get a certain section then we need to query this information with the API below:
https://en.wikipedia.org/w/api.php?action=parse&format=json&page=(NAME_OF_WIKI_PAGE)&prop=sections&disabletoc=1
This will give us JSON info about names of each section.
Once we have section info, use the parse API to get the wikitext. If we want the HTML, we can change prop to text:
https://en.wikipedia.org/w/api.php?action=parse&format=json&page=(NAME_OF_WIKI)&prop=wikitext§ion=(SECTION #)&disabletoc=1
As a result, we get the specific section we want formatted in JSON. The next step for me is sorting this and trying to get this HTML/wiki text into plain text.
Upvotes: 2