Reputation: 103
i made a python program with beautifulsoup that is supposed to find a certain value from a site but, the program doesn't seem to find the value.
import bs4
from urllib.request import urlopen as ureq
from bs4 import BeautifulSoup as soup
my_url = 'http://www.calcalist.co.il/stocks/home/0,7340,L-4135-22212222,00.html?quote=%D7%93%D7%95%D7%9C%D7%A8'
uclient = ureq(my_url)
page_html = uclient.read()
uclient.close()
page_soup = soup(page_html, "html.parser")
value = page_soup.find("td",{"class":"RightBlack"})
print(value)
the value i am trying to find is the dollar converted into Israeli currency but for some reason the line of code the is supposed to retrieve that value:
value = page_soup.find("td",{"class":"RightBlack"})
can't find it.
Upvotes: 4
Views: 909
Reputation: 6518
Notice the element you want to get is inside an iframe
, which means this is another request, different from the one you made, you can do a code to iterate over all iframes
and print the price if it finds a iframe_soup.find("td",{"class":"RightBlack"})
.
I'd recommend to use the except
statement, since it's easy to fall into url traps when doing this:
from urllib.request import urlopen as ureq
from bs4 import BeautifulSoup as soup
my_url = 'http://www.calcalist.co.il/stocks/home/0,7340,L-4135-22212222,00.html?quote=%D7%93%D7%95%D7%9C%D7%A8'
uclient = ureq(my_url)
page_html = uclient.read()
page_soup = soup(page_html, "html.parser")
iframesList = page_soup.find_all('iframe')
i = 1
for iframe in iframesList:
print(i, ' out of ', len(iframesList), '...')
try:
uclient = ureq("http://www.calcalist.co.il"+iframe.attrs['src'])
iframe_soup = soup(uclient.read(), "html.parser")
price = iframe_soup.find("td",{"class":"RightBlack"})
if price:
print(price)
break
except:
print("something went wrong")
i+=1
Running the code, this outputs:
1 out of 8 ...
2 out of 8 ...
3 out of 8 ...
4 out of 8 ...
5 out of 8 ...
<td class="RightBlack">3.5630</td>
So now we have what we want:
>>> price
<td class="RightBlack">3.5630</td>
>>> price.text
'3.5630'
Selenium
This is a recommendation, to do requests and JavaScript handling you should use Selenium
with a JS interpreter, below I'm using ChromeDriver
, but you can also use PhantomJS
for a headless browsing. Inspecting the frame element, we know it's id is "StockQuoteIFrame"
to get there, we use .switch_to_frame
, and then we can easily find our price
:
from selenium import webdriver
from bs4 import BeautifulSoup
url = 'http://www.calcalist.co.il/stocks/home/0,7340,L-4135-22212222,00.html?quote=%D7%93%D7%95%D7%9C%D7%A8'
browser = webdriver.Chrome()
browser.get(url)
browser.switch_to_frame(browser.find_element_by_id("StockQuoteIFrame"))
price = browser.find_element_by_class_name("RightBlack").text
The output, of course, is the same as the first option:
>>> price
'3.5630'
Upvotes: 3