robots.txt
robots.txt

Reputation: 137

Trouble getting the second link when the first link contains certain keyword right next to it

I've created a script in python in association with selenium to get the first link (populated by duckduckgo.com) of any search item unless the keyword Ad is right next to that link like the image below. If the first link contains the very keyword then the script will get the second link and quits.

My current search is houzz

enter image description here

This is my try (it always gets the first link irrespective of the presence of that keyword Ad):

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

link = "https://duckduckgo.com/?q={}&ia=web"

def get_info(driver,keyword):
    driver.get(link.format(keyword))
    for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"h2.result__title"))):
        lead_link = item.find_element_by_css_selector("a.result__a").get_attribute("href")
        break
    print(lead_link)

if __name__ == '__main__':
    chromeOptions = webdriver.ChromeOptions()
    chromeOptions.add_argument("--headless")
    driver = webdriver.Chrome(options=chromeOptions)
    wait = WebDriverWait(driver, 10)
    try:
        get_info(driver,"*houzz*")
    finally:
        driver.quit()

How can I rectify my script to get the second link if the Ad keyword adjacent to the first link?

Upvotes: 1

Views: 71

Answers (2)

pguardiario
pguardiario

Reputation: 55002

It looks like just add #links:

lead_link = item.find_element_by_css_selector("#links a.result__a").get_attribute("href")

The ads are inside of a #ads div

Upvotes: 3

JeffC
JeffC

Reputation: 25664

You can use the XPath

//h2[not(./span)]/a
  ^ h2 is the container for the entire link plus Ad icon
    ^ exclude h2s with SPAN children since they contain the Ad icons
                  ^ what you DO want is the A result (hyperlink)

Upvotes: 2

Related Questions