Capstone
Capstone

Reputation: 2282

How do I parse wikitext using built-in mediawiki support for lua scripting?

The wiktionary entry for faint lies at https://en.wiktionary.org/wiki/faint

The wikitext for the etymology section is:

From {{inh|en|enm|faynt}}, {{m|enm|feynt||weak; feeble}}, from {{etyl|fro|en}} {{m|fro|faint}}, {{m|fro|feint||feigned; negligent; sluggish}}, past participle of {{m|fro|feindre}}, {{m|fro|faindre||to feign; sham; work negligently}}, from {{etyl|la|en}} {{m|la|fingere||to touch, handle, usually form, shape, frame, form in thought, imagine, conceive, contrive, devise, feign}}.

It contains various templates of the form {{xyz|...}}

I would like to parse them and get the text output as it shows on the page:

From Middle English faynt, feynt (“weak; feeble”), from Old French faint, feint (“feigned; negligent; sluggish”), past participle of feindre, faindre (“to feign; sham; work negligently”), from Latin fingere (“to touch, handle, usually form, shape, frame, form in thought, imagine, conceive, contrive, devise, feign”).

I have about 10000 entries extracted from the freely available dumps of wiktionary here.

To do this, my thinking is to extract templates and their expansions (in some form). To explore the possibilites I've been fiddling with the lua scripting facility on mediawiki. By trying various queries inside the debug console on edit pages of modules, like here:

https://en.wiktionary.org/w/index.php?title=Module:languages/print&action=edit

mw.log(p)
>> table

mw.logObject(p)
>> table#1 {
  ["code_to_name"] = function#1,
  ["name_to_code"] = function#2,
}

p.code_to_name("aaa")
>>

p.code_to_name("ab")
>>

But, I can't even get the function calls right. p.code_to_name("aaa") doesn't return anything.

The code that presumably expands the templates for the etymology section is here: https://en.wiktionary.org/w/index.php?title=Module:etymology/templates

How do I call this code correctly? Is there a simpler way to achieve my goal of parsing wikitext templates? Is there some function available in mediawiki that I can call like "parse-wikitext("text"). If so, how do I invoke it?

Upvotes: 1

Views: 780

Answers (1)

cyclaminist
cyclaminist

Reputation: 1807

To expand templates (and other stuff) in wikitext, use frame.preprocess, which is called as a method on a frame object. To get a frame object, use mw.getCurrentFrame. For instance, type = mw.getCurrentFrame():preprocess('{{l|en|word}}') in the console to get the wikitext resulting from {{l|en|word}}. That currently gives <span class="Latn" lang="en">[[word#English|word]]</span>.

You can also use the Expandtemplates action in the MediaWiki API ( https://en.wiktionary.org/w/api.php?action=expandtemplates&text={{l|en|word}}), or the Special:ExpandTemplates page, or JavaScript (if you open the browser console while browsing a Wiktionary page):

new mw.Api().get({
        action: 'parse',
        text: '{{l|en|word}}',
        title: mw.config.values.wgPageName,
    }).done(function (data) {
        const wikitext = data.parse.text['*'];
        if (wikitext)
            console.log(wikitext);
});

If the mw.api library hasn't already been loaded and you get a TypeError ("mw.Api is not a constructor"):

mw.loader.using("mediawiki.api", function() {
    // Use mw.Api here.
});

So these are some of the ways to expand templates.

Upvotes: 4

Related Questions