Reputation: 235
I am trying to download an image from Wikimedia Commons by using a URL to a page in the file namespace:
wget http://commons.wikimedia.org/wiki/File:A_golden_tree_during_the_golden_season.JPG
all I get is a JPG file that I cannot open. But when you go to the link you actually see the page instead of the image itself, but there is a link called "Full resolution" that sends you to the real image link which is: http://upload.wikimedia.org/wikipedia/commons/9/92/A_golden_tree_during_the_golden_season.JPG
How can I download this file by having only the first link ?
Upvotes: 3
Views: 1302
Reputation: 21
you can use the following link to retrive :https://upload.wikimedia.org/wikipedia/commons/9/92/A_golden_tree_during_the_golden_season.JPG Even I had got the same problem,click on the image you will get the above link ,i hope this helps
Upvotes: 0
Reputation: 1321
Extract the title without namespace (A_golden_tree_during_the_golden_season.JPG
) and pass it to Special:Redirect.
wget http://commons.wikimedia.org/wiki/Special:Redirect/file/$( echo 'http://commons.wikimedia.org/wiki/File:A_golden_tree_during_the_golden_season.JPG' | sed 's/.*\/File\:\(.*\)/\1/g' )
Upvotes: 2
Reputation: 1458
You can try the following:
wget http://commons.wikimedia.org/wiki/File:A_golden_tree_during_the_golden_season.JPG -O output.html; wget $(cat output.html | grep fullMedia | sed 's/\(.*href="\/\/\)\([^ ]*\)\(" class.*\)/\2/g')
The first wget
fetches the link you specify. I browsed few pages and found that high resolution images were under div
with class=fullMedia. It parses the url of the image and then fetches that image.
PS: As suggested above, bash is not a neat way of doing this. You should look at something that parses dom trees.
Upvotes: 2
Reputation: 533
wget http://upload.wikimedia.org/wikipedia/commons/9/92/A_golden_tree_during_the_golden_season.JPG
You were fetching the web page not the image itself.
Upvotes: 0