Reputation: 39179
I'm using the Python bindings to run Selenium WebDriver:
from selenium import webdriver
wd = webdriver.Firefox()
I know I can grab a webelement like so:
elem = wd.find_element_by_css_selector('#my-id')
And I know I can get the full page source with...
wd.page_source
But is there a way to get the "element source"?
elem.source # <-- returns the HTML as a string
The Selenium WebDriver documentation for Python are basically non-existent and I don't see anything in the code that seems to enable that functionality.
What is the best way to access the HTML of an element (and its children)?
Upvotes: 678
Views: 752676
Reputation: 1139
Here's how to get the HTML source code using Selenium Python:
elem = driver.find_element("xpath", "//*")
source_code = elem.get_attribute("outerHTML")
Here's how to save that HTML to a file:
with open('c:/html_source_code.html', 'w') as f:
f.write(source_code.encode('utf-8'))
Upvotes: 103
Reputation: 180
In current versions of php-webdriver (1.12.0+) you have to use
$element->getDomProperty('innerHTML');
as pointed out in this issue: https://github.com/php-webdriver/php-webdriver/issues/929
Upvotes: 0
Reputation: 1211
To start with, download the Python bindings for Selenium WebDriver.
Read the innerHTML
attribute to get the source of the element’s content. innerHTML
is a property of a DOM element whose value is the HTML between the opening tag and ending tag.
For example, the innerHTML property in the code below carries the value “text”
<p>
a text
</p>
element.get_attribute('innerHTML')
Read the outerHTML
to get the source with the current element. outerHTML
is an element property whose value is the HTML between the opening and closing tags and the HTML of the selected element itself.
For example, the code’s outerHTML
property carries a value that contains div
and span
inside that.
<div>
<span>Hello there!</span>
</div>
ele.get_atrribute("outerHTML")
Upvotes: 3
Reputation: 1302
In PHP Selenium WebDriver you can get page source like this:
$html = $driver->getPageSource();
Or get HTML of the element like this:
// innerHTML if you need HTML of the element content
$html = $element->getDomProperty('outerHTML');
Upvotes: 0
Reputation: 58
Use execute_script get html
bs4(BeautifulSoup) also can access html tag quickly.
from bs4 import BeautifulSoup
html = adriver.execute_script("return document.documentElement.outerHTML")
bs4_onepage_object=BeautifulSoup(html,"html.parser")
bs4_div_object=bs4_onepage_object.find_all("atag",class_="attribute")
Upvotes: 0
Reputation: 193048
The other answers provide a lot of details about retrieving the markup of a WebElement. However, an important aspect is, modern websites are increasingly implementing JavaScript, ReactJS, jQuery, Ajax, Vue.js, Ember.js, GWT, etc. to render the dynamic elements within the DOM tree. Hence there is a necessity to wait for the element and its children to completely render before retrieving the markup.
Hence, ideally you need to induce WebDriverWait for the visibility_of_element_located()
and you can use either of the following Locator Strategies:
Using get_attribute("outerHTML")
:
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#my-id")))
print(element.get_attribute("outerHTML"))
Using execute_script()
:
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#my-id")))
print(driver.execute_script("return arguments[0].outerHTML;", element))
Note: You have to add the following imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Upvotes: 11
Reputation: 4473
The method to get the rendered HTML I prefer is the following:
driver.get("http://www.google.com")
body_html = driver.find_element_by_xpath("/html/body")
print body_html.text
However, the above method removes all the tags (yes, the nested tags as well) and returns only text content. If you interested in getting the HTML markup as well, then use the method below.
print body_html.getAttribute("innerHTML")
Upvotes: 2
Reputation: 17553
InnerHTML will return the element inside the selected element and outerHTML will return the inside HTML along with the element you have selected
Example:
Now suppose your Element is as below
<tr id="myRow"><td>A</td><td>B</td></tr>
<td>A</td><td>B</td>
<tr id="myRow"><td>A</td><td>B</td></tr>
Live Example:
Below you will find the syntax which require as per different binding. Change the innerHTML
to outerHTML
as per required.
Python:
element.get_attribute('innerHTML')
Java:
elem.getAttribute("innerHTML");
If you want whole page HTML, use the below code:
driver.getPageSource();
Upvotes: 4
Reputation: 27
And in PHPUnit Selenium test it's like this:
$text = $this->byCssSelector('.some-class-nmae')->attribute('innerHTML');
Upvotes: -1
Reputation: 719
It looks outdated, but let it be here anyway. The correct way to do it in your case:
elem = wd.find_element_by_css_selector('#my-id')
html = wd.execute_script("return arguments[0].innerHTML;", elem)
or
html = elem.get_attribute('innerHTML')
Both are working for me (selenium-server-standalone-2.35.0).
Upvotes: 7
Reputation: 2368
If you are interested in a solution for Selenium Remote Control in Python, here is how to get innerHTML:
innerHTML = sel.get_eval("window.document.getElementById('prodid').innerHTML")
Upvotes: 0
Reputation: 203
Using the attribute method is, in fact, easier and more straightforward.
Using Ruby with the Selenium and PageObject gems, to get the class associated with a certain element, the line would be element.attribute(Class)
.
The same concept applies if you wanted to get other attributes tied to the element. For example, if I wanted the string of an element, element.attribute(String)
.
Upvotes: 6
Reputation: 14279
There is not really a straightforward way of getting the HTML source code of a webelement
. You will have to use JavaScript. I am not too sure about python bindings, but you can easily do like this in Java. I am sure there must be something similar to JavascriptExecutor
class in Python.
WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element);
Upvotes: 102
Reputation: 10434
You can read the innerHTML
attribute to get the source of the content of the element or outerHTML
for the source with the current element.
Python:
element.get_attribute('innerHTML')
Java:
elem.getAttribute("innerHTML");
C#:
element.GetAttribute("innerHTML");
Ruby:
element.attribute("innerHTML")
JavaScript:
element.getAttribute('innerHTML');
PHP:
$element->getAttribute('innerHTML');
It was tested and worked with the ChromeDriver
.
Upvotes: 1033
Reputation: 9
WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element);
This code really works to get JavaScript from source as well!
Upvotes: -1
Reputation: 121
This works seamlessly for me.
element.get_attribute('innerHTML')
Upvotes: 2
Reputation: 53
I hope this could help: http://selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html
Here is described Java method:
java.lang.String getText()
But unfortunately it's not available in Python. So you can translate the method names to Python from Java and try another logic using present methods without getting the whole page source...
E.g.
my_id = elem[0].get_attribute('my-id')
Upvotes: 2
Reputation: 259
In Ruby, using selenium-webdriver (2.32.1), there is a page_source
method that contains the entire page source.
Upvotes: 15