Pythonist
Pythonist

Reputation: 2679

Getting image URL embedded in <script> tag with Python requests

I'm trying to use Python requests to get the url of an image in this web. Especifically, I'd like to get the URL to the image that starts with PPI_Z_005...

Now, to get this, I try to get the html with Python requests.

weburl="https://smn.conagua.gob.mx/tools/GUI/visor_radares_v2/radares/cabos/cabos_ppi.php"
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
           '(KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
response = requests.get(weburl, verify=False, headers=headers)

The problem is that the response has no explicit reference to the file name I'm looking for. I guess the problem is that it is somehow rendered by JavaScript, and inserted within a <script> tag. Indeed, when I inspect the source code of the web with the browser's developers tool, it contains this:

  <script>
    [...]
    imagen_eco(/* Radar */ 'cabos', /* Nombre imagen */ "PPI_Z_005_300_20220206141529.png", /* Producto */ 'ppi', /* Limites */ [[25.589004,-112.910417],[20.147021,-106.944245]]);
    [...]
  </script>

I guess this tag is somehow responsible to insert the image in the rendered webpage... but how?

Is it possible to use requests alone to parse this web and obtain this filename?

NOTE: I'm aware that this can be accomplished using selenium. I'm specifically looking for a selenium-free solution.

Upvotes: 1

Views: 460

Answers (1)

makaramkd
makaramkd

Reputation: 46

This is a solution with requests and regex to find the data that you are looking for.

import requests
import re


weburl = (
    "https://smn.conagua.gob.mx/tools/GUI/visor_radares_v2/radares/cabos/cabos_ppi.php"
)
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36"
    "(KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"
}
response = requests.get(weburl, verify=False, headers=headers)

source = response.content.decode("utf-8")

imagen_eco = re.search("imagen_eco((.*?));", source)
if not imagen_eco:
    exit("Not found")

image_name = re.search(r"([\w-]+)\.png", imagen_eco.group(0))
if not image_name:
    exit("Not found")
print(image_name.group(0))
print(
    f"https://smn.conagua.gob.mx/tools/GUI/visor_radares_v2/ecos/cabos/ppi/{image_name.group(0)}"
)

Upvotes: 2

Related Questions