Reputation: 563
I've been trying to scrape the headlines off of Yahoo! Finance's page for individual stocks. For example, I wanted to get the headlines for GOOGL, but I can't seem to get the right CSS selector for BeautifulSoup to scrape. Any ideas? I have tried multiple variations of the code below and substituting my selector with: "a", "href", "#yui_3_9_1_8_1459741486422_44", "li", "ul" etc. I've left my latest iteration with the "a" tag which, I know, gives you all the page's links, not simply the headlines.
import re
import requests
from bs4 import BeautifulSoup
URL = 'http://finance.yahoo.com/q?s=GOOGL'
res = requests.get(URL)
res.raise_for_status()
content = res.content
soup = BeautifulSoup(content, 'html.parser')
print(soup.select('a'))
http://finance.yahoo.com/q/h?s=GOOGL&t=2016-04-03T21:02:10-04:00
This is what I get when I try to copy the selector (I have Chrome, utilizing the built-in Inspector): #yui_3_9_1_8_1459741486422_44. Tried every variation I could think of for this id, nothing has worked.
The API, ystockquote, doesn't have a function which lets you easily get the headlines, I don't think...?
Upvotes: 3
Views: 1066
Reputation: 474271
Get the list of headline links from under the div
with yfi_quote_headline
class:
links = soup.select('div.yfi_quote_headline ul > li > a')
for link in links:
print(link.get_text(strip=True))
Upvotes: 2