Reputation: 25
Code works but the results are accumulating the previous one, please help me thankyou.
url=https://www.bbc.com/news/world
news_search = driver.find_elements(By.XPATH, "//div[@class='gs-c-promo gs-t-News nw-c-promo gs-o-faux-block-link gs-u-pb gs-u-pb+@m nw-p-default gs-c-promo--inline gs-c-promo--stacked@xl gs-c-promo--flex']")
title = []
link = []
for search in news_search:
title.append(search.find_element(By.XPATH, "//h3[@class='gs-c-promo-heading__title gel-pica-bold nw-o-link-split__text']").text)
link.append(search.find_element(By.XPATH, ".//a[@class='gs-c-promo-heading gs-o-faux-block-link__overlay-link gel-pica-bold nw-o-link-split__anchor']").get_attribute('href'))
print(f'Title:{title}\nLink:{link}')
driver.quit()
Upvotes: 0
Views: 55
Reputation: 193268
The following minor adjustments can rectify and improve the performance of your program significantly:
<h3>
elements are unique elements.<a>
elements are the ancestor of the <h3>
.news_search
Your optimized code block is:
driver.execute("get", {'url': 'https://www.bbc.com/news/world'})
titles = [my_elem.text for my_elem in driver.find_elements(By.XPATH, "//h3[@class='gs-c-promo-heading__title gel-pica-bold nw-o-link-split__text']")]
links = [my_elem.get_attribute("href") for my_elem in driver.find_elements(By.XPATH, "//h3[@class='gs-c-promo-heading__title gel-pica-bold nw-o-link-split__text']//ancestor::a[1]")]
for i,j in zip(titles, links): print(f"{i} link is {j}")
driver.quit()
Console Output:
Ukraine nuclear workers: We're kept at gunpoint link is https://www.bbc.com/news/world-europe-62509638
US justice department seeks to unseal Trump warrant link is https://www.bbc.com/news/world-us-canada-62512360
Rollercoaster crash at Legoland Germany injures 31 link is https://www.bbc.com/news/world-europe-62512359
France battles 'monster' wildfire near Bordeaux link is https://www.bbc.com/news/world-europe-62503775
Disabled children abused in Ukraine, warns UN link is https://www.bbc.com/news/disability-62513459
Aggressive dolphin bites two more Japanese swimmers link is https://www.bbc.com/news/world-asia-62508073
...
...
Upvotes: 1
Reputation: 33361
Your code is almost there, you just need to add a dot .
for the title
element XPath expression as following:
news_search = driver.find_elements(By.XPATH, "//div[@class='gs-c-promo gs-t-News nw-c-promo gs-o-faux-block-link gs-u-pb gs-u-pb+@m nw-p-default gs-c-promo--inline gs-c-promo--stacked@xl gs-c-promo--flex']")
title = []
link = []
for search in news_search:
title.append(search.find_element(By.XPATH, ".//h3[@class='gs-c-promo-heading__title gel-pica-bold nw-o-link-split__text']").text)
link.append(search.find_element(By.XPATH, ".//a[@class='gs-c-promo-heading gs-o-faux-block-link__overlay-link gel-pica-bold nw-o-link-split__anchor']").get_attribute('href'))
print(f'Title:{title}\nLink:{link}')
driver.quit()
//h3[@class='gs-c-promo-heading__title gel-pica-bold nw-o-link-split__text']
XPath will return the first match for this locator on the entire DOM while if you use this expression with leading dot .
it will locate the element inside the search
web element .//h3[@class='gs-c-promo-heading__title gel-pica-bold nw-o-link-split__text']
Upvotes: 1
Reputation: 7
Just try to add a "." in your title.append function like this:
title.append(search.find_element(By.XPATH, ".//h3[@class='gs-c-promo-heading__title gel-pica-bold nw-o-link-split__text']").text)
Upvotes: 0
Reputation: 69
You were printing the list of all titles and links, rather than the current one. Change it to the following:
news_search = driver.find_elements(By.XPATH, "//div[@class='gs-c-promo gs-t-News nw-c-promo gs-o-faux-block-link gs-u-pb gs-u-pb+@m nw-p-default gs-c-promo--inline gs-c-promo--stacked@xl gs-c-promo--flex']")
titles = []
links = []
for search in news_search:
title = search.find_element(By.XPATH, ".//h3[@class='gs-c-promo-heading__title gel-pica-bold nw-o-link-split__text']").text
link = search.find_element(By.XPATH, ".//a[@class='gs-c-promo-heading gs-o-faux-block-link__overlay-link gel-pica-bold nw-o-link-split__anchor']").get_attribute('href')
print(f'Title:{title}\nLink:{link}')
titles.append(title)
links.append(link)
driver.quit()
Upvotes: 1