Reputation: 424
I want to get the result from a wikipedia page https://en.wikipedia.org/wiki/February_2 as JSON.
I tried using their API: https://en.wikipedia.org/w/api.php?action=parse&page=February_19&prop=text&formatversion=2&format=json
Though it is giving it as Json format. The content is HTML. I want only the content.
I need a way to get clean result.
Upvotes: 0
Views: 2234
Reputation: 7880
If you want plain text without markup, you have first to parse the JSON object and then extract the text from the HTML code:
function htmlToText(html) {
let tempDiv = document.createElement("div");
tempDiv.innerHTML = html;
return tempDiv.textContent || tempDiv.innerText || "";
}
const url = 'https://en.wikipedia.org/w/api.php?action=parse&page=February_19&prop=text&format=json&formatversion=2&origin=*';
$.getJSON(url, function(data) {
const html = data['parse']['text'];
const plainText = htmlToText(html);
const array = [...plainText.matchAll(/^\d{4} *–.*/gm)].map(x=>x[0]);
console.log(array);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
Update: I edited the code above according to the comment below. Now the function extracts all the list items putting them into an array.
Upvotes: 1
Reputation: 9086
I guess by clean you mean the source wikitext. In that case you can use the revisions module:
See API:Get the contents of a page and API:Revisions for more info.
Upvotes: 0