Reputation: 33
I am struggling in scraping a certain row from this website.
First of all the table element has no class in it but I think i got a workaround to that.
My problem is that I want to print (or store in a variable or access the data) of a certain row, Let's say the row with the first value "Bollate": Screenshot of the row in the website
So I coded:
import requests
import bs4
URL = "http://www.centrometeolombardo.com/content.asp?CatId=332&ContentType=Dati"
response = requests.get(URL)
soup = bs4.BeautifulSoup(response.text, "lxml")
table = soup.find(text="Bollate").find_parent("table")
for a in table:
if a.text == "Bollate":
for val in a.parent-find_next_siblings():
print(val.text)
But I get getting:
Traceback (most recent call last):
File "/home/pi/Documents/Python/ngu.py", line 12, in <module>
if a.text == "Bollate":
File "/usr/lib/Python3/dist-packages/bs4/element.py", line 370, in _getattr_
self._class_._name_, attr))
AttributeError: 'NavigableString' object has no attribute 'text'
Which suggests me I am wrong since I get something that is not a text but I do not know how to overcome the problem.
Thanks all
Upvotes: 3
Views: 1489
Reputation: 84465
You can isolate the row with :contains and :has to ensure b tag with that text within a tr. You also need to target the right nested table e.g. with nth-child
import requests
from bs4 import BeautifulSoup
page_source = requests.get('http://www.centrometeolombardo.com/content.asp?CatId=332&ContentType=Dati').text
soup = BeautifulSoup(page_source, 'lxml')
print([td.get_text(strip=True) for td in soup.select('div:nth-child(5) table:nth-child(3) tr:has(b:contains("Bollate")) td')])
Thanks to @SIM for pointing out one could avoid hardcoding an index by using the following pattern instead:
soup.select("table > tr:has(> td > a:contains('Bollate')) td")
Upvotes: 3
Reputation: 20042
You can use pandas
to grab the HTML
and parse the table. Then just select the value you need.
Here's how:
import pandas as pd
url = "http://www.centrometeolombardo.com/content.asp?CatId=332&ContentType=Dati"
df = pd.read_html(url, flavor="bs4")[19]
print(df.loc[df[0] == "Bollate"])
Output:
0 1 2 3 4 5
2 Bollate -0.3 12.3 Brina - -
Upvotes: 3