Reputation: 738
I want to download all the Chinese Wikipedia data (text + images), I downloaded the articles but I got confused with these media files, and also the remote-media files are ridiculously huge, what are they? do I have to download them?
From: http://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/fulls/20121104/
zhwiki-20121104-local-media-1.tar 4.1G
zhwiki-20121104-remote-media-1.tar 69.9G
zhwiki-20121104-remote-media-2.tar 71.1G
zhwiki-20121104-remote-media-3.tar 69.3G
zhwiki-20121104-remote-media-4.tar 48.9G
Thanks!
Upvotes: 3
Views: 2252
Reputation: 664307
I'd assume that they are the media files included from Wikimedia Commons, which are most of the images in the articles. From https://wikitech.wikimedia.org/wiki/Dumps/media:
For each wiki, we dump the image, imagelinks and redirects tables via /backups/imageinfo/wmfgetremoteimages.py. Files are written to /data/xmldatadumps/public/other/imageinfo/ on dataset2.
From the above we then generate the list of all remotely stored (i.e. on commons) media per wiki, using different args to the same script.
And it's not that huge for all files from the Chinese Wikipedia :-)
Upvotes: 1