user6446052
user6446052

Reputation: 93

How to extract WikiTables from Wikipedia page by API?

I am trying to extract every textual content from a Wikipedia Page including the tables using API sandbox for the Wikipedia page on Ballon_d'Or.

I tried the given query:

https://en.wikipedia.org/w/api.php?action=query&format=json&prop=extracts&titles=Ballon_d%27Or&explaintext=1&exsectionformat=wiki

but it provides me only the textual content without the content from the wiki table like this one:

enter image description here

Is there a way I could obtain the table content in a textual format along with the textual information already being obtained?

Alternatively, I can try the web crawling technique using beautiful Soup but I wanted to look for the query method, first.

Upvotes: 2

Views: 1731

Answers (1)

Termininja
Termininja

Reputation: 7036

Use action=parse instead query:

https://en.wikipedia.org/w/api.php?action=parse&page=Ballon_d'Or&prop=text

By using &section=2 you will access the second section Winners.

This maybe will help you later also: Regular expression to remove HTML tags

Upvotes: 1

Related Questions