Python Web Scraping with Selenium and lxml

Question

I have a problem I need help with. I am trying to scrape some numbers from a website (see the link in the code below). Because the website is loaded using JavaScript I am using selenium to first load the page and then pass it to xlml to parse the data.

The code I am using is the following:

from selenium import webdriver
from lxml import html
import time

url = "http://sebgroup.com/large-corporates-and-institutions/prospectuses-and-downloads/rates/swap-rates"
xpath = '//*[@id="doc"]/table[2]/tbody/tr[3]/text()'

chrome_path = "my_path"
browser = webdriver.Chrome(chrome_path)
browser.get(url)
time.sleep(10)

html_source = browser.page_source

tree = html.fromstring(html_source)
text = tree.xpath(xpath)
print (text)

When I look at the page directly though my browser I can see the numbers in the source code. But when I do the same thing using the selenium the source-code I see is different. I was wondering whether this because the website has some anti-scraping software? Is there anyway to still get the data? (I need them for academic use).

Andersson · Accepted Answer

The table you want to handle located inside an iframe, so you should switch to it before getting page source. Try following:

chrome_path = "my_path"
browser = webdriver.Chrome(chrome_path)
browser.get(url)
time.sleep(10)
browser.switch_to.frame(browser.find_element_by_tag_name("iframe"))
html_source = browser.page_source

Python Web Scraping with Selenium and lxml

Answers (1)

Related Questions