OldGeeksGuide
OldGeeksGuide

Reputation: 2928

Wikipedia links API returns links that aren't on the page

I'm experimenting with the python module wikipedia which is a wrapper for the wikipedia API. In particular I'm looking at the links API, which as I understand should return a 'List of titles of Wikipedia page links on a page', i.e. all the references to other wikipedia pages within the text of the page I'm querying about. When I look at the result for the article on Google, I get a list of links as expected (wikipedia titles in JSON format). The problem is that there seem to be links listed there that do not appear on the Google page. I thought maybe it's including links to Google, but that doesn't work either, in particular, the third link returned in the JSON structure is to ADATA. I don't see a link to ADATA anywhere on the Google page, nor a link to Google anywhere on the ADATA page. Is this a bug or am I missing something obvious?

I believe this link is enough to reproduce the issue:

https://en.wikipedia.org/w/api.php?action=query&titles=Google&prop=links

The result I see looks like this:

{
    "continue": {
        "plcontinue": "1092923|0|Aardvark_(search_engine)",
        "continue": "||"
    },
    "query": {
        "pages": {
            "1092923": {
                "pageid": 1092923,
                "ns": 0,
                "title": "Google",
                "links": [
                    {
                        "ns": 0,
                        "title": "111 Eighth Avenue"
                    },
                    {
                        "ns": 0,
                        "title": "2600: The Hacker Quarterly"
                    },
                    {
                        "ns": 0,
                        "title": "ADATA"
                    },
. . .

In python you can reproduce like this:

import wikipedia
wikipedia.page('Google').links

which produces output like this:

['111 Eighth Avenue',
 '2600: The Hacker Quarterly',
 'ADATA',
 'AI Challenge',
 'AKM Semiconductor, Inc.',
 'AOL',
 'API.AI',

Upvotes: 3

Views: 465

Answers (2)

Tgr
Tgr

Reputation: 28220

The list contains links which appear in the wikitext of the page or in templates called from the wikitext. It is updated by a queued job after every edit. Due to the async nature of job handling and the finite number of retries for failed jobs, it is possible for the list to differ from actual article content, but very unlikely. (It's probably possible to add links to wikitext in such a way that they don't show up in the article HTML at all, but again it's unlikely anyone would actually do that.)

Upvotes: 2

OldGeeksGuide
OldGeeksGuide

Reputation: 2928

There seem to be some bits of the page that are not visible by default when visiting the page. In this example, the link appears when you click on the 'show' button for "Major information technology companies" at the bottom of the page. I believe this should account for what I'm seeing.

Thanks to zwer in the comments for pointing out where to find the link.

Upvotes: 0

Related Questions