Reputation: 306
I am trying to fetch the links of all news articles related to Apple, using this webpage: https://finance.yahoo.com/quote/AAPL/news?p=AAPL. But there are also a lot of links for advertisements in between and other links guiding to other pages of the website. How do I selectively only fetch links to news articles? Here is the code I have written so far:
driver = webdriver.Chrome(executable_path='C:\\Users\\Home\\OneDrive\\Desktop\\AJ\\chromedriver_win32\\chromedriver.exe')
driver.get("https://finance.yahoo.com/quote/AAPL/news?p=AAPL")
links=[]
for a in driver.find_elements_by_xpath('.//a'):
links.append(a.get_attribute('href'))
def get_info(url):
#send request
response = requests.get(url)
#parse
soup = BeautifulSoup(response.text)
#get information we need
news = soup.find('div', attrs={'class': 'caas-body'}).text
headline = soup.find('h1').text
date = soup.find('time').text
return news, headline, date
Can anyone guide on how to do this or to a resource that can help with this? Thanks!
Upvotes: 0
Views: 288
Reputation: 3433
Try this xpath
to get all the news links from that page.
//li[contains(@class,'js-stream-content')]/div[@data-test-locator='mega']//h3/a
driver.implicitly_wait(10)
driver.maximize_window()
driver.get("https://finance.yahoo.com/quote/AAPL/news?p=AAPL")
time.sleep(10)
links = driver.find_elements_by_xpath("//li[contains(@class,'js-stream-content')]/div[@data-test-locator='mega']//h3/a")
for link in links:
print(link.get_attribute("href"))
Upvotes: 1