Reputation: 13
I'm new in Python and now I'm trying to scraping the Currency buy/sell rate from BMO website(https://www.bmo.com/home/personal/banking/rates/foreign-exchange) However, all I got was nothing when I tried scraping.
I learned with a sample and wrote a very sample one below with Visual Studio 2019. I could print the paragraph text from those websites however when I changed the xpath to the table cell element path, it returned nothing.
For scraping paragraph text and working:
import requests
from lxml.html import etree
url = 'https://www.bmo.com/home/personal/banking/rates/foreign-exchange'
r=requests.get(url).text
s=etree.HTML(r)
file = s.xpath('//*[@id="main_content"]/p[2]/text()')
print(file)
It working well and output: The rates provided ... bottom of the page as well.
When changed s.xpath back to '//*[@id="ratesTable"]/tbody/tr[2]/td[3]/text()', (I'm trying to scrape selling rate by US dollar), It returned a '[]' with nothing inside. I debugged with 'file' element, it had nothing inside and the length was 0 as well.
Did I do something wrong here? I believe the xpath and url are correctly. And I hope I could get the decimal number 1.2931 (selling rate) in the cell.
Upvotes: 0
Views: 247
Reputation: 13459
Many websites nowadays load content dynamically or modify the Document Object Model through some javascript. Such websites can still be scraped, but you will have to dig into the javascript pieces.
In this particular case, the table is being loaded through a javascript call, something you can verify by disabling javascript in your browser. If your browser supports it, open the web developer tools and inspect the network page, which will show you all of the resources that were loaded to generate this page. Among these resources, you'll find a few of interesting pieces of javascript, such as json_fx_include.js which seems to hold the data you're looking for.
Upvotes: 0
Reputation: 195583
The data you see on the page are loaded dynamically from different URL through Javascript. With re
and 'ast' modules you can retrieve this information:
import re
import requests
from ast import literal_eval
data_url = 'https://www.bmo.com/bmocda/templates/json_fx_include.jsp'
data = literal_eval( re.findall(r'FX = (\{.*?\});', requests.get(data_url).text, flags=re.DOTALL)[0] )
from pprint import pprint
pprint(data)
print(data['USD']['NA']['BUY'])
Prints:
{'EUR': {'NA': {'BUY': '1.4069', 'SELL': '1.5288'},
'OA': {'BUY': '1.4472', 'SELL': '1.5137'},
'OB': {'BUY': '1.4523', 'SELL': '1.5092'},
'OC': {'BUY': '1.456', 'SELL': '1.5055'},
'OD': {'BUY': '1.4634', 'SELL': '1.4982'}},
'USD': {'NA': {'BUY': '1.2931', 'SELL': '1.3589'},
'OA': {'BUY': '1.2958', 'SELL': '1.3562'},
'OB': {'BUY': '1.3027', 'SELL': '1.3493'},
'OC': {'BUY': '1.3061', 'SELL': '1.3459'},
'OD': {'BUY': '1.3075', 'SELL': '1.3445'}}}
1.2931
Upvotes: 1