Shay Lavi
Shay Lavi

Reputation: 103

HTML web scraping for a value

i made a python program with beautifulsoup that is supposed to find a certain value from a site but, the program doesn't seem to find the value.

import bs4
from urllib.request import urlopen as ureq
from bs4 import BeautifulSoup as soup
my_url = 'http://www.calcalist.co.il/stocks/home/0,7340,L-4135-22212222,00.html?quote=%D7%93%D7%95%D7%9C%D7%A8'
uclient = ureq(my_url)
page_html = uclient.read()
uclient.close()
page_soup = soup(page_html, "html.parser")
value = page_soup.find("td",{"class":"RightBlack"})
print(value)

the value i am trying to find is the dollar converted into Israeli currency but for some reason the line of code the is supposed to retrieve that value:

value = page_soup.find("td",{"class":"RightBlack"})

can't find it.

Upvotes: 4

Views: 909

Answers (1)

Vinícius Figueiredo
Vinícius Figueiredo

Reputation: 6518

1. First Option, what you can do using BeautifulSoup

Notice the element you want to get is inside an iframe, which means this is another request, different from the one you made, you can do a code to iterate over all iframes and print the price if it finds a iframe_soup.find("td",{"class":"RightBlack"}).

I'd recommend to use the except statement, since it's easy to fall into url traps when doing this:

from urllib.request import urlopen as ureq
from bs4 import BeautifulSoup as soup

my_url = 'http://www.calcalist.co.il/stocks/home/0,7340,L-4135-22212222,00.html?quote=%D7%93%D7%95%D7%9C%D7%A8'
uclient = ureq(my_url)
page_html = uclient.read()
page_soup = soup(page_html, "html.parser")

iframesList = page_soup.find_all('iframe')
i = 1
for iframe in iframesList:
    print(i, ' out of ', len(iframesList), '...')
    try:
        uclient = ureq("http://www.calcalist.co.il"+iframe.attrs['src'])
        iframe_soup = soup(uclient.read(), "html.parser")
        price = iframe_soup.find("td",{"class":"RightBlack"})
        if price:
            print(price)
            break
    except:
        print("something went wrong")
    i+=1

Running the code, this outputs:

1  out of  8 ...
2  out of  8 ...
3  out of  8 ...
4  out of  8 ...
5  out of  8 ...
<td class="RightBlack">3.5630</td>

So now we have what we want:

>>> price
<td class="RightBlack">3.5630</td>
>>> price.text
'3.5630'

2. Second Option, use Selenium

This is a recommendation, to do requests and JavaScript handling you should use Selenium with a JS interpreter, below I'm using ChromeDriver, but you can also use PhantomJS for a headless browsing. Inspecting the frame element, we know it's id is "StockQuoteIFrame" to get there, we use .switch_to_frame, and then we can easily find our price:

from selenium import webdriver
from bs4 import BeautifulSoup

url = 'http://www.calcalist.co.il/stocks/home/0,7340,L-4135-22212222,00.html?quote=%D7%93%D7%95%D7%9C%D7%A8'

browser = webdriver.Chrome()
browser.get(url)

browser.switch_to_frame(browser.find_element_by_id("StockQuoteIFrame"))
price = browser.find_element_by_class_name("RightBlack").text

The output, of course, is the same as the first option:

>>> price
'3.5630'

Upvotes: 3

Related Questions