agou
agou

Reputation: 738

Which files to download for all wikipedia images

I want to download all the Chinese Wikipedia data (text + images), I downloaded the articles but I got confused with these media files, and also the remote-media files are ridiculously huge, what are they? do I have to download them?

From: http://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/fulls/20121104/

zhwiki-20121104-local-media-1.tar   4.1G
zhwiki-20121104-remote-media-1.tar  69.9G
zhwiki-20121104-remote-media-2.tar  71.1G
zhwiki-20121104-remote-media-3.tar  69.3G
zhwiki-20121104-remote-media-4.tar  48.9G

Thanks!

Upvotes: 3

Views: 2252

Answers (1)

Bergi
Bergi

Reputation: 664307

I'd assume that they are the media files included from Wikimedia Commons, which are most of the images in the articles. From https://wikitech.wikimedia.org/wiki/Dumps/media:

For each wiki, we dump the image, imagelinks and redirects tables via /backups/imageinfo/wmfgetremoteimages.py. Files are written to /data/xmldatadumps/public/other/imageinfo/ on dataset2.

From the above we then generate the list of all remotely stored (i.e. on commons) media per wiki, using different args to the same script.

And it's not that huge for all files from the Chinese Wikipedia :-)

Upvotes: 1

Related Questions