Bruno Ribeiro
Bruno Ribeiro

Reputation: 6217

Different response using Python requests

I'm trying to download an image from a URL using requests. Using browser or a REST client, like restlet chrome extension I can retrieve the normal content, a json, and a binary image that I can save to disk.

Using requests as response result I got almost same response headers, only Content-Length has a different value - 15 bytes instead of 35 kilobytes - and I can't found the binary image.

Trying to simulate the request made by the browser I configure the same request header, like this:

headers = {"Host": "cpom.prefeitura.sp.gov.br",
           "Pragma": "no-cache",
           "Cache-Control": "no-cache",
           "DNT": "1",
           "Accept": "*/*",
           "Accept-Encoding": "gzip, deflate, br",
           "Accept-Language": "en-US,en;q=0.9,pt;q=0.8",
           "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                         "AppleWebKit/537.36 (KHTML, like Gecko) "
                         "Chrome/65.0.3325.181 Safari/537.36"
           }

r = requests.get(url, stream=True, headers=headers)

There's no redirects, I also debug and look the content of requests.model.Response but no success.

What I'm missing? I think that is a detail about the request, but I can't get it.

This my test:

url = "https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral/ImagemCaptcha?u=8762520"
r = requests.get(url, stream=True)

if r.status_code == 200:
    print(r.raw.headers)
    with open("/home/bruno/captcha/8762520.txt", "wb") as f:  # saving as text, since is not the png image
        for chunk in r:
            f.write(chunk)

This is the URL to download the image: https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral/ImagemCaptcha?u=4067913

And this the site with the captcha image: https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral

With a simple GET will get only a json response body, but inspecting the response you'll see the binary response, which is the image - ~36kb size.

EDIT: include images from restlet client

Request: Request sample

Response: Partial response

Upvotes: 0

Views: 1480

Answers (1)

javidcf
javidcf

Reputation: 59731

The difference is in the Cookie header. Restlet makes use of existing Chrome's cookies by default (see docs), but if you set the Cookie header to an empty string you will see you do not get the image. I you want to be able to retrieve the image from a Python script, you will need to obtain first a valid cookie making a request to another valid URL in the web app (for example the link with the form that you posted) and look into the Set-Cookie (see MDN docs for more information).

Upvotes: 1

Related Questions