Reputation: 281
I am trying to parse (623) 337-****
from a JS generated site. My code is :
from selenium import webdriver
import re
browser = webdriver.Firefox()
browser.get('http://www.spokeo.com/search?q=Joe+Henderson,+Phoenix,+AZ&sao7=t104#:18643819031')
content = browser.page_source
browser.quit()
m_obj = re.search(r"(\(\d{3}\)\s\d{3}-\*{4})", content)
if m_obj:
print m_obj.group(0)
For some reason it doesn`t print anything. Any help is apreciated
Sidenote : Is there a faster way to do it in python
Upvotes: 1
Views: 93
Reputation: 473833
The problem is that some of the content is loaded dynamically via post page load ajax requests.
You should wait until an element becomes visible (documentation) - then get the source code of the page:
import re
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
browser = webdriver.Firefox()
browser.get('http://www.spokeo.com/search?q=Joe+Henderson,+Phoenix,+AZ&sao7=t104#:18643819031')
WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID, "profile_details_section_header")))
content = browser.page_source
m_obj = re.search(r"(\(\d{3}\)\s\d{3}-\*{4})", content)
if m_obj:
print m_obj.group(0)
browser.quit()
Or you can call time.sleep()
or browser.implicitly_wait()
instead - though it doesn't sound quite right.
Prints (623) 337-****
.
Hope that helps.
Upvotes: 1