Reputation: 63
I have found and read quite some articles about scraping but am somehow as a beginner overwhelmed. I want to get data from a table (https://www.senamhi.gob.pe/mapas/mapa-estaciones/_dat_esta_tipo.php?estaciones=472CA750)
I tried around with beautifulsoup and can get a list of the available option_tags (see options in soup object).
I am now troubling with getting the actual content / how to access for each date / option the table and save into e.g. a pandas df.
Any advices where to begin?
Here my code to get the options:
from bs4 import BeautifulSoup
import requests
resp = requests.get("https://www.senamhi.gob.pe/mapas/mapa-estaciones/_dat_esta_tipo.php?estaciones=472CA750")
html = resp.content
soup = BeautifulSoup(html)
option_tags = soup.find_all("option")
Upvotes: 0
Views: 290
Reputation: 4744
When I look your given url , I think the table is embeded the website which is given :
<iframe src="_dat_esta_tipo02.php?estaciones=472CA750&tipo=SUT&CBOFiltro=201902&t_e=M" name="contenedor" width="600" marginwidth="0" height="560" marginheight="0" scrolling="NO" align="center" frameborder="0" id="interior"></iframe>
When you click src https://www.senamhi.gob.pe/mapas/mapa-estaciones/_dat_esta_tipo.php?estaciones=472CA750 page is opens and shows the same table so you can soap this page . I try it for you Its given the true result
**All Code : **
from bs4 import BeautifulSoup
import requests
resp = requests.get("https://www.senamhi.gob.pe/mapas/mapa-
estaciones/_dat_esta_tipo02.php?
estaciones=472CA750&tipo=SUT&CBOFiltro=201902&t_e=M")
html = resp.content
soup = BeautifulSoup(html,"lxml") ## Add lxml or html.parser in this line
option_tags = soup.find_all("tr" , attrs={'aling' : 'center'})
for a in option_tags:
print a.find('div').text
OUTPUT :
Día/mes/año
Prom
01-02-2019
02-02-2019
03-02-2019
04-02-2019
05-02-2019
06-02-2019
07-02-2019
08-02-2019
09-02-2019
10-02-2019
11-02-2019
12-02-2019
13-02-2019
14-02-2019
15-02-2019
16-02-2019
17-02-2019
18-02-2019
Above code just get the date only. If you want to access all elements with given date you can create an array and append it . Just will change below code
array = []
for a in option_tags:
array.append(a.text.split())
print array
Upvotes: 1