Reputation: 20222
I want to get Wikipedia pages as text.
I looked at the Wikipedia API from here https://en.wikipedia.org/w/api.php which says that in order to get pages as text I need to append this to a page address:
api.php?action=query&meta=siteinfo&siprop=namespaces&format=txt
However, when I try appending this suffix to a normal page's address, the page is not found:
https://en.wikipedia.org/wiki/George_Washington/api.php?action=query&meta=siteinfo&siprop=namespaces&format=txt
Following the instructions from Get Text Content from mediawiki page via API, I tried adding /api.php?action=parse&page=test
to the end of the query string. Therefore, I obtained this:
https://en.wikipedia.org/wiki/George_Washington/api.php?action=parse&page=test
However, this doesn't work either.
Upvotes: 1
Views: 3223
Reputation: 13087
NB: All this examples are CORS enabled.
Text only
From the precise title, as seen in the wikipedia page url:
Search relevant pages by keywords
Get IDs, get precise titles/url, get some quick text extract;
Wiki page ID
Using the precise title:
Full html
By wiki page ID, includes the Wikitext:
https://en.wikipedia.org/w/api.php?action=parse&origin=*&format=json&pageid=100017
Stripped html
Lighter html version, without the Wikitext.
Cross origin:
About using CORS requests, sometimes it may require 2 calls to the API, to jump between ID and page title.
In a ssl context, we can use fetch to embed some wiki text anywhere.
Example remote .json.
fetch("https://en.wikipedia.org/w/api.php?action=query&origin=*&prop=extracts&explaintext&format=json&titles=Sokolsky_Opening").then(v => v.json()).then((function(v){
main.innerHTML = v["query"]["pages"]["100017"]["extract"]
})
)
<pre id="main" style="white-space: pre-wrap"></pre>
⚠️ This API has some quirks, some pages with heavy contents get truncated sometimes, among other things and possible rate limiting.
🧘 Good luck. 🜀
Upvotes: 3
Reputation: 7036
You have to use some of these formats: json
, jsonfm
, none
, php
, phpfm
, rawfm
, xml
or xmlfm
, so txt
is not valid format. Also your API link is wrong, use this:
https://en.wikipedia.org/w/api.php?action=query&titles=George_Washington&prop=revisions&rvprop=content&format=xml
Upvotes: 1