Reputation: 177
I have a problem I need help with. I am trying to scrape some numbers from a website (see the link in the code below). Because the website is loaded using JavaScript I am using selenium to first load the page and then pass it to xlml to parse the data.
The code I am using is the following:
from selenium import webdriver
from lxml import html
import time
url = "http://sebgroup.com/large-corporates-and-institutions/prospectuses-and-downloads/rates/swap-rates"
xpath = '//*[@id="doc"]/table[2]/tbody/tr[3]/text()'
chrome_path = "my_path"
browser = webdriver.Chrome(chrome_path)
browser.get(url)
time.sleep(10)
html_source = browser.page_source
tree = html.fromstring(html_source)
text = tree.xpath(xpath)
print (text)
When I look at the page directly though my browser I can see the numbers in the source code. But when I do the same thing using the selenium the source-code I see is different. I was wondering whether this because the website has some anti-scraping software? Is there anyway to still get the data? (I need them for academic use).
Upvotes: 2
Views: 4472
Reputation: 52695
The table you want to handle located inside an iframe
, so you should switch to it before getting page source. Try following:
chrome_path = "my_path"
browser = webdriver.Chrome(chrome_path)
browser.get(url)
time.sleep(10)
browser.switch_to.frame(browser.find_element_by_tag_name("iframe"))
html_source = browser.page_source
Upvotes: 2