Reputation: 267
I'm using Anaconda and BeautifulSoup to scrape data off a site.
import requests
resp = requests.get('https://www.url.com')
Weathertest = resp.text
from bs4 import BeautifulSoup
soup = BeautifulSoup(Weathertest,'lxml')
mintemp = BeautifulSoup(Weathertest, 'lxml')
mintemp.find_all('p',class_='weatherhistory_results_datavalue temp_mn')
What I'm trying to do is pull the minimum temperature for a particular day. Here's the html of the page:
<tr class="weatherhistory_results_datavalue temp_mn"><th><h3>Minimum Temperature</h3></th><td><p><span class="value">47.3</span> <span class="units">°F</span></p></td></tr>
I realized after I tried the above and got the result of [] that the weatherhistory class is not a p class so the above doesn't work. Instead, I tried:
mintemp = BeautifulSoup(Weathertest, 'lxml')
mintemp.find_all('tr',class_='weatherhistory_results_datavalue temp_mn')
And the result I got is the entire html string above (from tr class through the /tr). I've tried finding how to pull a p value from a tr class but I'm not coming up with anything. I'm fairly new to all this, so I'm sure it's something simple I just don't know yet.
Or perhaps I need a compound statement, like "find all the tr class above and then give me the p value" but I'm not sure how to code that.
Upvotes: 1
Views: 466
Reputation: 1030
try this:
>>>data = """<tr class="weatherhistory_results_datavalue temp_mn"><th><h3>Minimum Temperature</h3></th><td><p><span class="value">47.3</span> <span class="units">°F</span></p></td></tr>"""
>>> from bs4 import BeautifulSoup
>>> soap = BeautifulSoup(data,"lxml")
>>> temp = soap.find_all("tr",{"class":"weatherhistory_results_datavalue temp_mn"})
>>> for i in temp:
a = i.find("span",{"class": "value"})
print(a.text)
Upvotes: 1