shmulik90
shmulik90

Reputation: 63

Scraping a Java Web-page

I have found and read quite some articles about scraping but am somehow as a beginner overwhelmed. I want to get data from a table (https://www.senamhi.gob.pe/mapas/mapa-estaciones/_dat_esta_tipo.php?estaciones=472CA750)

I tried around with beautifulsoup and can get a list of the available option_tags (see options in soup object).

I am now troubling with getting the actual content / how to access for each date / option the table and save into e.g. a pandas df.

Any advices where to begin?

Here my code to get the options:

from bs4 import BeautifulSoup
import requests
resp = requests.get("https://www.senamhi.gob.pe/mapas/mapa-estaciones/_dat_esta_tipo.php?estaciones=472CA750")

html = resp.content 
soup = BeautifulSoup(html)

option_tags = soup.find_all("option")

Upvotes: 0

Views: 290

Answers (1)

Omer Tekbiyik
Omer Tekbiyik

Reputation: 4744

When I look your given url , I think the table is embeded the website which is given :

 <iframe src="_dat_esta_tipo02.php?estaciones=472CA750&tipo=SUT&CBOFiltro=201902&t_e=M" name="contenedor" width="600" marginwidth="0" height="560" marginheight="0" scrolling="NO" align="center"  frameborder="0" id="interior"></iframe>

When you click src https://www.senamhi.gob.pe/mapas/mapa-estaciones/_dat_esta_tipo.php?estaciones=472CA750 page is opens and shows the same table so you can soap this page . I try it for you Its given the true result

**All Code : **

from bs4 import BeautifulSoup
import requests
resp = requests.get("https://www.senamhi.gob.pe/mapas/mapa- 
estaciones/_dat_esta_tipo02.php? 
estaciones=472CA750&tipo=SUT&CBOFiltro=201902&t_e=M")

html = resp.content
soup = BeautifulSoup(html,"lxml") ## Add lxml  or html.parser in this line

option_tags = soup.find_all("tr" , attrs={'aling' : 'center'})

for a in option_tags:
    print a.find('div').text

OUTPUT :

Día/mes/año
Prom
01-02-2019
02-02-2019
03-02-2019
04-02-2019
05-02-2019
06-02-2019
07-02-2019
08-02-2019
09-02-2019
10-02-2019
11-02-2019
12-02-2019
13-02-2019
14-02-2019
15-02-2019
16-02-2019
17-02-2019
18-02-2019

Above code just get the date only. If you want to access all elements with given date you can create an array and append it . Just will change below code

array = []
for a in option_tags:
    array.append(a.text.split())

print array

Upvotes: 1

Related Questions