Reputation: 6217
I'm trying to download an image from a URL using requests
. Using browser or a REST client, like restlet chrome extension I can retrieve the normal content, a json, and a binary image that I can save to disk.
Using requests
as response result I got almost same response headers, only Content-Length
has a different value - 15 bytes instead of 35 kilobytes - and I can't found the binary image.
Trying to simulate the request made by the browser I configure the same request header, like this:
headers = {"Host": "cpom.prefeitura.sp.gov.br",
"Pragma": "no-cache",
"Cache-Control": "no-cache",
"DNT": "1",
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9,pt;q=0.8",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/65.0.3325.181 Safari/537.36"
}
r = requests.get(url, stream=True, headers=headers)
There's no redirects, I also debug and look the content of requests.model.Response
but no success.
What I'm missing? I think that is a detail about the request, but I can't get it.
This my test:
url = "https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral/ImagemCaptcha?u=8762520"
r = requests.get(url, stream=True)
if r.status_code == 200:
print(r.raw.headers)
with open("/home/bruno/captcha/8762520.txt", "wb") as f: # saving as text, since is not the png image
for chunk in r:
f.write(chunk)
This is the URL to download the image: https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral/ImagemCaptcha?u=4067913
And this the site with the captcha image: https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral
With a simple GET
will get only a json response body, but inspecting the response you'll see the binary response, which is the image - ~36kb size.
EDIT: include images from restlet client
Upvotes: 0
Views: 1480
Reputation: 59731
The difference is in the Cookie
header. Restlet makes use of existing Chrome's cookies by default (see docs), but if you set the Cookie
header to an empty string you will see you do not get the image. I you want to be able to retrieve the image from a Python script, you will need to obtain first a valid cookie making a request to another valid URL in the web app (for example the link with the form that you posted) and look into the Set-Cookie
(see MDN docs for more information).
Upvotes: 1