user1008791
user1008791

Reputation: 1409

Python Selenium accessing HTML source

How can I get the HTML source in a variable using the Selenium module with Python?

I wanted to do something like this:

from selenium import webdriver

browser = webdriver.Firefox()
browser.get("http://example.com")
if "whatever" in html_source:
    # Do something
else:
    # Do something else

How can I do this? I don't know how to access the HTML source.

Upvotes: 138

Views: 240995

Answers (9)

user11555371
user11555371

Reputation:

Complete code:

from selenium import webdriver

# Initialize the WebDriver
driver = webdriver.Chrome()  # Use the appropriate WebDriver for your browser

# Navigate to the desired URL
driver.get("https://www.example.com/")

# Access the page's HTML source
html_source = driver.page_source

if "whatever" in html_source:
   # do something
else:
   # do something else

# if you want to display complete source code.
print(html_source)

# Close the WebDriver
driver.quit()

Upvotes: 0

Mobin Al Hassan
Mobin Al Hassan

Reputation: 1044

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
html_source_code = driver.execute_script("return document.body.innerHTML;")
html_soup: BeautifulSoup = BeautifulSoup(html_source_code, 'html.parser')

Now you can apply BeautifulSoup function to extract data...

Upvotes: 18

AutomatedTester
AutomatedTester

Reputation: 22418

You need to access the page_source property:

from selenium import webdriver

browser = webdriver.Firefox()
browser.get("http://example.com")

html_source = browser.page_source
if "whatever" in html_source:
    # do something
else:
    # do something else

Upvotes: 262

SysMurff
SysMurff

Reputation: 122

You can simply use the WebDriver object, and access to the page source code via its @property field page_source...

Try this code snippet :-)

from selenium import webdriver
driver = webdriver.Firefox('path/to/executable')
driver.get('https://some-domain.com')
source = driver.page_source
if 'stuff' in source:
    print('found...')
else:
    print('not in source...')

Upvotes: 1

Dhiraj
Dhiraj

Reputation: 497

driver.page_source will help you get the page source code. You can check if the text is present in the page source or not.

from selenium import webdriver
driver = webdriver.Firefox()
driver.get("some url")
if "your text here" in driver.page_source:
    print('Found it!')
else:
    print('Did not find it.')

If you want to store the page source in a variable, add below line after driver.get:

var_pgsource=driver.page_source

and change the if condition to:

if "your text here" in var_pgsource:

Upvotes: 8

Mahesh Reddy Atla
Mahesh Reddy Atla

Reputation: 529

By using the page source you will get the whole HTML code.
So first decide the block of code or tag in which you require to retrieve the data or to click the element..

options = driver.find_elements_by_name_("XXX")
for option in options:
    if option.text == "XXXXXX":
        print(option.text)
        option.click()

You can find the elements by name, XPath, id, link and CSS path.

Upvotes: 3

Bob Evans
Bob Evans

Reputation: 616

To answer your question about getting the URL to use for urllib, just execute this JavaScript code:

url = browser.execute_script("return window.location;")

Upvotes: 1

Griffin
Griffin

Reputation: 644

I'd recommend getting the source with urllib and, if you're going to parse, use something like Beautiful Soup.

import urllib

url = urllib.urlopen("http://example.com") # Open the URL.
content = url.readlines() # Read the source and save it to a variable.

Upvotes: -7

Milanka
Milanka

Reputation: 1842

With Selenium2Library you can use get_source()

import Selenium2Library
s = Selenium2Library.Selenium2Library()
s.open_browser("localhost:7080", "firefox")
source = s.get_source()

Upvotes: 5

Related Questions