Reputation: 137
I've created a script in python in association with selenium to get the first link (populated by duckduckgo.com
) of any search item unless the keyword Ad
is right next to that link like the image below. If the first link contains the very keyword then the script will get the second link and quits.
My current search is houzz
This is my try (it always gets the first link irrespective of the presence of that keyword Ad
):
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://duckduckgo.com/?q={}&ia=web"
def get_info(driver,keyword):
driver.get(link.format(keyword))
for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"h2.result__title"))):
lead_link = item.find_element_by_css_selector("a.result__a").get_attribute("href")
break
print(lead_link)
if __name__ == '__main__':
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument("--headless")
driver = webdriver.Chrome(options=chromeOptions)
wait = WebDriverWait(driver, 10)
try:
get_info(driver,"*houzz*")
finally:
driver.quit()
How can I rectify my script to get the second link if the Ad
keyword adjacent to the first link?
Upvotes: 1
Views: 71
Reputation: 55002
It looks like just add #links
:
lead_link = item.find_element_by_css_selector("#links a.result__a").get_attribute("href")
The ads are inside of a #ads
div
Upvotes: 3
Reputation: 25664
You can use the XPath
//h2[not(./span)]/a
^ h2 is the container for the entire link plus Ad icon
^ exclude h2s with SPAN children since they contain the Ad icons
^ what you DO want is the A result (hyperlink)
Upvotes: 2