user2033412
user2033412

Reputation: 2119

get Thumbnail image from wikimedia commons

I do have a filename from wikimedia commons and I want to access the thumbnail-image directly.

Example: Tour_Eiffel_Wikimedia_Commons.jpg

I found a way to get json-data containing the url to the thumbnail I want:

https://en.wikipedia.org/w/api.php?action=query&titles=Image:Tour_Eiffel_Wikimedia_Commons.jpg&prop=imageinfo&iiprop=url&iiurlwidth=200

but I don't want another request. Is there a way to access the thumbnail directly?

Upvotes: 21

Views: 3621

Answers (3)

Alexa
Alexa

Reputation: 1157

In case anyone is doing this query in SPARQL instead of Python: There exists an MD5 function in SPARQL and the whole string manipulation can be implemented in SPARQL too!

  BIND(REPLACE(wikibase:decodeUri(STR(?image)), "http://commons.wikimedia.org/wiki/Special:FilePath/", "") as ?fileName) .
  BIND(REPLACE(?fileName, " ", "_") as ?safeFileName)
  BIND(MD5(?safeFileName) as ?fileNameMD5) .
  BIND(CONCAT("https://upload.wikimedia.org/wikipedia/commons/thumb/", SUBSTR(?fileNameMD5, 1, 1), "/", SUBSTR(?fileNameMD5, 1, 2), "/", ?safeFileName, "/650px-", ?safeFileName) as ?thumb)
 

Run this live query in Wikidata's query service: here, as discussed here: https://discourse-mediawiki.wmflabs.org/t/accessing-a-commons-thumbnail-via-wikidata/499

Upvotes: 2

loomi
loomi

Reputation: 3096

Solution in Python based on the solution of @svick:

import hashlib
def get_wc_thumb(image, width=300): # image = e.g. from Wikidata, width in pixels
    image = image.replace(' ', '_') # need to replace spaces with underline 
    m = hashlib.md5()
    m.update(image.encode('utf-8'))
    d = m.hexdigest()
    return "https://upload.wikimedia.org/wikipedia/commons/thumb/"+d[0]+'/'+d[0:2]+'/'+image+'/'+str(width)+'px-'+image

Upvotes: 4

svick
svick

Reputation: 244848

If you're okay to rely on the fact the current way of building the URL won't change in the future (which is not guaranteed), then you can do it.

The URL looks like this:

https://upload.wikimedia.org/wikipedia/commons/thumb/a/a8/Tour_Eiffel_Wikimedia_Commons.jpg/200px-Tour_Eiffel_Wikimedia_Commons.jpg

  • The first part is always the same: https://upload.wikimedia.org/wikipedia/commons/thumb
  • The second part is the first character of the MD5 hash of the file name. In this case, the MD5 hash of Tour_Eiffel_Wikimedia_Commons.jpg is a85d416ee427dfaee44b9248229a9cdd, so we get /a.
  • The third part is the first two characters of the MD5 hash from above: /a8.
  • The fourth part is the file name: /Tour_Eiffel_Wikimedia_Commons.jpg
  • The last part is the desired thumbnail width, and the file name again: /200px-Tour_Eiffel_Wikimedia_Commons.jpg

Upvotes: 28

Related Questions