Reputation: 9538
This is the Python Selenium code I am trying to use to get the title of the articles:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://techwithtim.net")
search = driver.find_element_by_name("s")
search.send_keys("test")
search.send_keys(Keys.RETURN)
try:
main = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.ID, "main"))
)
articles = main.find_elements_by_tag_name("article")
for article in articles:
header = article.find_elements_by_tag_name("a")[0]
#print(header.get_attribute('href'))
print(header.text)
finally:
time.sleep(5)
driver.quit()
The code is working well when extracting the href
attribute, but it didn't work for the .text as I got empty lines instead of the headers of the articles
How can I fix that?
Upvotes: 0
Views: 1080
Reputation: 12499
You may mean
print(header.get_attribute('innerHTML'))
To replace the ampersand sign, try
print(header.get_attribute('innerHTML').replace('&', '&'))
Or just use the innerText property:
print(header.get_attribute('innerText'))
Or the textContent Property:
print(header.get_attribute('textContent'))
Upvotes: 2