Reputation: 3275
The problem I am currently having is trying to download an image that displays as an animated gif
, but appears encoded as a jpg
. I say that it appears to be encoded as a jpg
because the file extension and mime-type are both .jpg add image/jpeg.
When downloading the file to my local machine (Mac OSX), then attempting to open the file I get the error:
The file could not be opened. It may be damaged or use a file format that Preview doesn’t recognize.
While I realize that some people would maybe just ignore that image, if it can be fixed, I'm looking for a solution to do that, not just ignore it.
The url in question is here:
http://www.supergrove.com/wp-content/uploads/2017/03/gif-images-22-1000-about-gif-on-pinterest.jpg
Here is my code, and I am open to suggestions:
from PIL import Image
import requests
response = requests.get(media, stream = True)
response.raise_for_status()
with open(uploadedFile, 'wb') as img:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
img.write(chunk)
img.close()
Upvotes: 1
Views: 453
Reputation: 3275
Had to answer my own question in this case, but the answer to this problem, was to add a referer
for the request. Most likely an htaccess file preventing some direct file access on the image's server unless the request came from their own server.
from fake_useragent import UserAgent
from io import StringIO,BytesIO
import io
import imghdr
import requests
# Set url
mediaURL = 'http://www.supergrove.com/wp-content/uploads/2017/03/gif-images-22-1000-about-gif-on-pinterest.jpg'
# Create a user agent
ua = UserAgent()
# Create a request session
s = requests.Session()
# Set some headers for the request
s.headers.update({ 'User-Agent': ua.chrome, 'Referrer': media })
# Make the request to get the image from the url
response = s.get(mediaURL, allow_redirects=False)
# The request was about to be redirected
if response.status_code == 302:
# Get the next location that we would have been redirected to
location = response.headers['Location']
# Set the previous page url as referer
s.headers.update({'referer': location})
# Try the request again, this time with a referer
response = s.get(mediaURL, allow_redirects=False, cookies=response.cookies)
print(response.headers)
Hat tip to @raratiru for suggesting the use of allow_redirects
.
Also noted in their answer is that the image's server might be intentionally blocking access to prevent general scrapers from viewing their images. Hard to tell, but regardless, this solution works.
Upvotes: 1
Reputation: 9616
According to Wheregoes, the link of the image:
http://www.supergrove.com/wp-content/uploads/2017/03/gif-images-22-1000-about-gif-on-pinterest.jpg
receives a 302 redirect to the page that contains it:
http://www.supergrove.com/gif-images/gif-images-22-1000-about-gif-on-pinterest/
Therefore, your code is trying to download a web page as an image.
I tried:
r = requests.get(the_url, headers=headers, allow_redirects=False)
But it returns zero content and status_code = 302
.
(Indeed that was obvious it should happen ...)
This server is configured in a way that it will never fulfill that request.
Bypassing that limitation sounds illegal difficult, to the best of my -limited perhaps- knowledge.
Upvotes: 1