Welsy
Welsy

Reputation: 77

Pywikibot - Find the source site of page image

I go through the living people category on wikipedia and I collect page images,. The problem is, some images are stored on the wikimedia commons site, whereas some are stored on the original wikipedia:en site. I want to know where the image is stored (if it were stored somewhere else besides en:wiki and commons)

import pywikibot

enwiki = pywikibot.Site("en", "wikipedia")
commons = pywikibot.Site("commons","commons")
page1 = pywikibot.Page(enwiki, "50 Cent")
page2 = pywikibot.Page(enwiki, "0010x0010")
pageimage1 = page1.page_image()
pageimage2 = page2.page_image()
pageimage1.exists() //outputs False (50 Cent page image is stored on commons)
pageimage2.exists() //outputs True  (0010x0010 page imaged is stored on wikipedia:en)

This is fine, I can check commons if the wikipedia .exists() outputs False, but I'm worried about a situation the image would be stored on a different site.

I've tried the Page.image_repository attribute, but this returns commons even though the page image does not exist there and is stored on wikipedia:en

Is there a way I can get the original site from the Page object? Because the only way I know this possible is to download the HTML page and parse it, which is way too complicated.

Upvotes: 0

Views: 176

Answers (1)

xqt
xqt

Reputation: 333

As noted by Tgr the best way is to use the FilePage.file_is_shared() method. To upcast the file you may do:

import pywikibot

def repo_file(filepage):
    """Return a FilePage residing on repository."""
    if filepage.file_is_shared():
        filepage = pywikibot.FilePage(filepage.site.image_repository(), filepage.title())
    return filepage

Using your first sample it will work like this:

site = pywikibot.Site('wikipeda:de')
page1 = pywikibot.Page(site, '50 Cent')
page2 = pywikibot.Page(site, '0010x0010')
img1 = page1.page_image()
img2 = page2.page_image()

Test the site:

img1.site
img2.site

will give

APISite("en", "wikipedia")
APISite("en", "wikipedia")

Now upcast it:

img1 = repo_file(img1)
img2 = repo_file(img2)

Again test the site:

img1.site
img2.site

will give

APISite("commons", "commons")
APISite("en", "wikipedia")

Upvotes: 1

Related Questions