Reputation: 14978
I am having a weird issue with Python and Selenium. I am accessing the URL https://www.biggerpockets.com/users/JarridJ1
. When you click more it shows further content. I can understand that it is a React-based website. When I view it on browser and doa View Source I can see the required stuff in a react element <div data-react-class="Profile/Header/Header" data-react-props="{"
. I tried to automate Firefox via Selenium but I could not even get with that as well.
Check the screenshot:
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def parse(u):
print('Processing... {}'.format(u))
driver.get(u)
sleep(2)
html = driver.page_source
driver.save_screenshot('bp.png')
print(html)
if __name__ == '__main__':
options = Options()
options.add_argument("--headless") # Runs Chrome in headless mode.
options.add_argument('--no-sandbox') # Bypass OS security model
options.add_argument('--disable-gpu') # applicable to windows os only
options.add_argument('start-maximized') #
options.add_argument('disable-infobars')
options.add_argument("--disable-extensions")
driver = webdriver.Firefox()
parse('https://www.biggerpockets.com/users/JarridJ1')
Upvotes: 0
Views: 127
Reputation: 3790
This is a tricky one but I found a way to get to the element you have highlighted. Still not sure why driver.page_source
is not return what you are looking for.
def parse(u):
print('Processing... {}'.format(u))
driver.get(u)
sleep(2)
get_everything = driver.find_elements_by_xpath("//*")
for element in get_everything:
print(element .get_attribute('innerHTML'))
#html = driver.page_source
#driver.save_screenshot('bp.png')
#print(html)
Below is my standalone example:
from selenium import webdriver
import time
driver = webdriver.Chrome("C:\Path\To\chromedriver.exe")
driver.get("https://www.biggerpockets.com/users/JarridJ1")
time.sleep(5)
a = driver.find_element_by_xpath("//div[@data-react-class='Profile/Header/Header']")
b = a.get_attribute("data-react-props")
print(b)
c = driver.find_elements_by_xpath("//*")
for i in c:
print(i.get_attribute('innerHTML'))
Upvotes: 1