huy
huy

Reputation: 306

Fetch only specific links using selenium in python

I am trying to fetch the links of all news articles related to Apple, using this webpage: https://finance.yahoo.com/quote/AAPL/news?p=AAPL. But there are also a lot of links for advertisements in between and other links guiding to other pages of the website. How do I selectively only fetch links to news articles? Here is the code I have written so far:

driver = webdriver.Chrome(executable_path='C:\\Users\\Home\\OneDrive\\Desktop\\AJ\\chromedriver_win32\\chromedriver.exe')
driver.get("https://finance.yahoo.com/quote/AAPL/news?p=AAPL")
links=[]
for a in driver.find_elements_by_xpath('.//a'):
    links.append(a.get_attribute('href'))

def get_info(url):
    #send request   
    response = requests.get(url)
    #parse    
    soup = BeautifulSoup(response.text)
    #get information we need
    news = soup.find('div', attrs={'class': 'caas-body'}).text
    headline = soup.find('h1').text 
    date = soup.find('time').text
    return news, headline, date

Can anyone guide on how to do this or to a resource that can help with this? Thanks!

Upvotes: 0

Views: 288

Answers (1)

pmadhu
pmadhu

Reputation: 3433

Try this xpath to get all the news links from that page.

//li[contains(@class,'js-stream-content')]/div[@data-test-locator='mega']//h3/a
driver.implicitly_wait(10)
driver.maximize_window()

driver.get("https://finance.yahoo.com/quote/AAPL/news?p=AAPL")
time.sleep(10)
links = driver.find_elements_by_xpath("//li[contains(@class,'js-stream-content')]/div[@data-test-locator='mega']//h3/a")
for link in links:
    print(link.get_attribute("href"))

Upvotes: 1

Related Questions