Reputation: 185
Here is the deal: i have a website that i want to extract some Href's, especifically the ones that have the text "LEIA ESTA EDIÇÃO", like in this HTML.
<a href="http://acervo.estadao.com.br/pagina/#!/20120824-43410-spo-1-pri-a1-not/busca/ministro+Minist%C3%A9rio" title="LEIA ESTA EDIÇÃO" style="" class="" xpath="1">LEIA ESTA EDIÇÃO</a>
this is the code i have, it's pretty wrong, i was making some tests to see if it work. By the way: It has to be selenium.
driver = webdriver.Chrome()
x = 1
while True:
try:
link = ("http://acervo.estadao.com.br/procura/#!/ministro%3B minist%C3%A9rio|||/Acervo/capa//{}/2000|2010|2010///Primeira").format(x)
driver.get(link)
time.sleep(1)
xpath = "//a[contains(text(),'LEIA ESTA EDIÇÃO')]"
links = driver.find_elements_by_xpath(xpath)
bw=('')
for link in links:
bw += link._element.get_attribute("href")
print (bw)
x = x + 1
time.sleep(1)
except NoSuchElementException:
pass
print(x)
time.sleep(1)
Upvotes: 2
Views: 701
Reputation: 31
I would really recommend you to read the selenium docs, the explanations over there are easy and straightforward.
There are some places your code can be improved:
You should get a list of links and extract the text hrefs out of them. A simple 1 liner can be (if there is at least 1 a tag with that text):
[a_tag.get_attribute('href') for a_tag in driver.find_elements_by_link_text("LEIA ESTA EDIÇÃO")]
The bw
: It will become 1 concatenated string of all of the hrefs, I am pretty sure that it is not what you are looking for but rather a list or other data structure.
I Would recommend reading this answer about string concatenation in python.
Upvotes: 1
Reputation: 52665
You can try below code to get required output:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver.get(link)
links = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.LINK_TEXT, "LEIA ESTA EDIÇÃO")))
references = [link.get_attribute("href") for link in links]
Upvotes: 3