Reputation:
I'm trying to extract two values from this website:
One value is the dollar rate from the right, and from the left the drop/rise in percentage.
The problem is that, after I'm getting the dollar rate value, the number is rounded from some reason. (You can see in the terminal). I want to get the exactly number as shown in the website.
Is there some friendly documentation for web scraping in Python?
P.S: how can I get rid of the pop up Python terminal window when running a code in VS ? I just want the output will be in VS - in the interactive window.
my_url = "https://www.bizportal.co.il/forex/quote/generalview/22212222"
uClient = urlopen(my_url)
page_html = uClient.read()
uClient.close()
page_soup = BeautifulSoup(page_html, "html.parser")
div_class = page_soup.findAll("div",{"class":"data-row"})
print (div_class)
#print(div_class[0].text)
#print(div_class[1].text)
Upvotes: 1
Views: 768
Reputation: 195543
The data is loaded dynamically via Ajax, but you can simulate this request with requests
module:
import json
import requests
url = 'https://www.bizportal.co.il/forex/quote/generalview/22212222'
ajax_url = "https://www.bizportal.co.il/forex/quote/AjaxRequests/DailyDeals_Ajax?paperId={paperId}&take=20&skip=0&page=1&pageSize=20"
paper_id = url.rsplit('/')[-1]
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}
data = requests.get(ajax_url.format(paperId=paper_id), headers=headers).json()
# uncomment this to print all data:
#print(json.dumps(data, indent=4))
# print first one
print(data['Data'][0]['rate'], data['Data'][0]['PrecentageRateChange'])
Prints:
3.4823 -0.76%
Upvotes: 2
Reputation: 437
The problem is this element is being dynamically updated with Javascript. You will not be able to scrape the 'up to date' value with urllib or requests. When the page is loaded, it has a recent value populated (likely from a database) and then it is replaced with the real time number via Javascript.
In this case it would be better to use something like Selenium, to load the webpage - this allows the javascript to execute on the page, and then scrape the numbers.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
options = Options()
options.add_argument("--headless") # allows you to scrape page without opening the browser window
driver = webdriver.Chrome('./chromedriver', options=options)
driver.get("https://www.bizportal.co.il/forex/quote/generalview/22212222")
time.sleep(1) # put in to allow JS time to load, sometimes works without.
values = driver.find_elements_by_class_name('num')
price = values[0].get_attribute("innerHTML")
change = values[1].find_element_by_css_selector("span").get_attribute("innerHTML")
print(price, "\n", change)
Output:
╰─$ python selenium_scrape.py
3.483
-0.74%
You should familiarize yourself with Selenium, understand how to set it up, and run it - this includes installing the browser (in this case I am using Chrome, but you can use others), understanding where to get the browser driver (Chromedriver in this case) and understand how to parse the page. You can learn all about it here https://www.selenium.dev/documentation/en/
Upvotes: 0