Reputation: 95
wanted to make a tool in order to save images from a specific link, but ecountered a problem.
My code is the following:
import urllib
urllib.urlretrieve(url, "img.jpg")
The thing is that if I use any link from google it works flawlessly.
For example:
(source: asha.org)
But if I want to get this specific image:
(source: keepeek-cache.com)
It saves the file as .jpg, but when I want to open it I get unsupported file format. Any ideas on how to fix it or what is the reason behind?
Upvotes: 1
Views: 807
Reputation: 19770
The problem is that the website is blocking downloads based on the browser signature. Rename your img.jpg
file to page.html
and open in a browser, then you will see something like this:
Error 1010 Ray ID: xxxxxxxxx • 2018-06-08 10:39:01 UTC
Access denied
What happened?
The owner of this website (asset.keepeek-cache.com) has banned your access based on your browser's signature (xxxxxxxxxx).
Cloudflare Ray ID: xxxxxxxxxx • Your IP: xx.xx.xx.xx • Performance & security by Cloudflare
Once you have considered if you want to perhaps contravene the web site owner's wishes, you can change your user agent by doing (for instance)
import urllib
# Change user agent to look like Firefox
urllib.URLopener.version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
# Download file with new user agent
urllib.urlretrieve(url, "img.jpg")
which fixed the problem for me.
Upvotes: 1