Reputation: 123
I want to select all the visible text under a web page where the text of each element/node in the DOM is separated.
PATH = "C:\Program Files (x86)\chromedriver.exe"
chrome_options = Options()
chrome_options.add_argument("--start-maximized") # must! else results are affected
driver = webdriver.Chrome(PATH, chrome_options=chrome_options)
driver.get("https://www.tesco.com/groceries/en-GB/products/291496210")
elements = driver.find_elements_by_xpath("//html/body//*[@class!='visually-hidden']")
# above xpath expression finds all elements under body that do not have the class of 'visually-hidden'
print(elements)
The problem I am facing is that the first element returned in the elements
list is the whole text of the whole web page, whereas I would like the text of each node that satisfies the XPATH expression to be a separate WebElement, for me to get properties related to it on its own.
Please help me out, thanks!
Upvotes: 0
Views: 1049
Reputation: 33351
You should iterate over all the elements, get text from each one and print it, like this:
driver.get("https://www.tesco.com/groceries/en-GB/products/291496210")
elements = driver.find_elements_by_xpath("//html/body//*[@class!='visually-hidden']")
for element in elements:
print(element.text)
Also, you should add some delay to make the page fully loaded before you getting all that elements and extracting their texts.
The simplest, but not the recommended way to do that is to add sleep, like this:
driver.get("https://www.tesco.com/groceries/en-GB/products/291496210")
time.sleep(10)
elements = driver.find_elements_by_xpath("//html/body//*[@class!='visually-hidden']")
for element in elements:
print(element.text)
Upvotes: 0