Reputation:
I want to scrape all href contents from the class "news" (Url is mentioned in the code) , I tried this code, but it is not working...
Code:
from bs4 import BeautifulSoup
from selenium import webdriver
Base_url = "http://www.thehindubusinessline.com/stocks/abb-india-ltd/overview/"
driver = webdriver.Chrome()
driver.set_window_position(-10000,-10000)
driver.get(Base_url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
for div in soup.find_all('div', class_='news'):
a = div.findAll('a')
print(a['href'])
Thank you
Upvotes: 2
Views: 158
Reputation: 7248
The content you want is located inside the frame:
<iframe width="100%" frameborder="0" src="http://hindubusiness.cmlinks.com/Companydetails.aspx?&cocode=INE117A01022" id="compInfo" height="600px">...</iframe>
So, first you'll have to switch to that frame. You can do this by adding these lines:
driver.switch_to.default_content()
driver.switch_to.frame('compInfo')
Complete code (making it headless):
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
Base_url = "http://www.thehindubusinessline.com/stocks/abb-india-ltd/overview/"
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(Base_url)
driver.switch_to.frame('compInfo')
soup = BeautifulSoup(driver.page_source, 'lxml')
for link in soup.select('.news a'):
print(link['href'])
Output:
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17040010444&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17038039002&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17019039003&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17019038003&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17019010085&opt=9
Upvotes: 2
Reputation: 2389
Something like this will work:
for div in soup.find_all('article', 'news'):
a = div.findAll('a')
links = [article['href'] for article in a ]
print(links)
Upvotes: 0