Reputation: 861
I use Selenium in Python for scraping. I can't get values though these values are displayed on the browser.
So I checked the HTML source code, then I found that there are no values in HTML as below.
HTML
<div id="pos-list-body" class="list-body">
</div>
But there are values when I checked developer tool in chrome.
DevTools
<div id="pos-list-body" class="list-body">
<div class="list-body-row" id="pos-row-1">
<div class="pos-list-col-1">
<input class="list-checkbox" type="checkbox" value="1">
</div>
<div class="detail-data pos-list-col-2">
1
</div>
<div class="detail-data pos-list-col-3">
a
</div>
...
</div>
<div class="list-body-row" id="pos-row-2">
<div class="pos-list-col-1">
<input class="list-checkbox" type="checkbox" value="2">
</div>
<div class="detail-data pos-list-col-2">
2
</div>
<div class="detail-data pos-list-col-3">
b
</div>
...
</div>
...
</div>
It seems that these values generated by JavaScript or something.
There is no iframe
in sorce code.
How can I get these values with python?
It would be appreciated if you could give me some hint.
Upvotes: 1
Views: 2266
Reputation: 1747
based on other answers that seems to be not working as a solution to your issue, one possibility left which is there are more then one HTML element in the DOM that has the ID : pos-list-body
, and I guess the first retrieved element by this ID is really empty and it is not your targeted element.
Solution : try to select the <div>
using Xpath instead of id, OR get all the elements with this id in a list and print the innerHTML
of each one of them to get your targeted element index.
Upvotes: 0
Reputation: 193338
The outerHTML
attribute of the Element gets the serialized HTML fragment describing the element including its descendants. It can also be set to replace the element with nodes parsed from the given string. However to only obtain the HTML representation of the contents of an element ideally you need to use the innerHTML
property instead. So reading the value of outerHTML
returns a DOMString containing an HTML serialization of the element and its descendants. Setting the value of outerHTML replaces the element and all of its descendants with a new DOM tree constructed by parsing the specified htmlString.
To get the html generated by JavaScript you can use the following solution:
print(driver.execute_script("return document.getElementById('pos-list-body').outerHTML"))
Upvotes: 0
Reputation: 29382
If ID pos-list-body
is unique in HTML-DOM
, then your best bet is to use explicit wait
with innerText
Code:
wait = WebDriverWait(driver, 20)
print(wait.until(EC.presence_of_element_located((By.ID, "pos-list-body"))).get_attribute('innerText'))
Imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Upvotes: 1