draw134
draw134

Reputation: 1187

How to get data from javascript rendered table using selenium in python

I have a website to scrape and i am using selenium to do it. When i finished writing the code, i noticed that i was not getting output at all when i print the table contents. I viewed the page source and then i found out that the table was not in the source. That is why even i find the xpath of the table from inspect element i cant get any output of it. Do someone know how could I get the response/data or just printing the table from the javascript response? Thanks.

Here is my current code

from bs4 import BeautifulSoup
from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument('--incognito')
chrome_path = r"C:\chromedriver.exe"
driver = webdriver.Chrome(chrome_path,options=options)

driver.implicitly_wait(3)
url = "https://reversewhois.domaintools.com/?refine#q=
%5B%5B%5B%22whois%22%2C%222%22%2C%22VerifiedID%40SG-Mandatory%22%5D%5D%5D"
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html,'lxml')

#These line of codes is for selecting the desired search parameter from the combo box, you can disregard this since i was putting the whole url with params
input = driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[3]/input')
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[1]/div').click()
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[5]/div[1]/div/div[3]').click()
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[2]/div/div[1]').click()
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[6]/div[1]/div/div[1]').click
input.send_keys("VerifiedID@SG-Mandatory")
driver.find_element_by_xpath('//*[@id="search-button-container"]/button').click()


table = driver.find_elements_by_xpath('//*[@id="refine-preview-content"]/table/tbody/tr/td')
for i in table:
     print(i) no output

I just want to scrape all the domain names like in the first result like 0 _ _ .sg

Upvotes: 0

Views: 2190

Answers (2)

QHarr
QHarr

Reputation: 84475

If you are open to other ways does the following give the expected results? It mimics the xhr the page does (though I have trimmed it down to essential elements only) to retrieve the lookup results. Faster than using a browser.

from bs4 import BeautifulSoup as bs
import requests
import pandas as pd

headers = {'User-Agent': 'Mozilla/5.0'}
r = requests.get('https://reversewhois.domaintools.com/?ajax=mReverseWhois&call=ajaxUpdateRefinePreview&q=[[[%22whois%22,%222%22,%22VerifiedID@SG-Mandatory%22]]]&sf=true', headers=headers)
table = pd.read_html(r.json()['results'])
print(table)

Upvotes: 1

Prakhar Jhudele
Prakhar Jhudele

Reputation: 965

You can try the below code. After you have selected all the details options to fill and click on the search button it is kind of an implicit wait to make sure we get the full page source. Then we used the read_html from pandas which searches for any tables present in the html and returns a list of dataframe. we take the required df from there.

from selenium import webdriver
import time
from selenium.webdriver.chrome.options import Options
import pandas as pd

options = Options()
options.add_argument('--incognito')
chrome_path = r"C:/Users/prakh/Documents/PythonScripts/chromedriver.exe"
driver = webdriver.Chrome(chrome_path,options=options)

driver.implicitly_wait(3)
url = "https://reversewhois.domaintools.com/?refine#q=%5B%5B%5B%22whois%22%2C%222%22%2C%22VerifiedID%40SG-Mandatory%22%5D%5D%5D"
driver.get(url)
#html = driver.page_source
#soup = BeautifulSoup(html,'lxml')

#These line of codes is for selecting the desired search parameter from the combo box
input = driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[3]/input')
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[1]/div').click()
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[5]/div[1]/div/div[3]').click()
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[2]/div/div[1]').click()
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[6]/div[1]/div/div[1]').click
input.send_keys("VerifiedID@SG-Mandatory")
driver.find_element_by_xpath('//*[@id="search-button-container"]/button').click()

time.sleep(5)
html = driver.page_source
tables = pd.read_html(html)

df = tables[-1]
print(df)

Upvotes: 1

Related Questions