dataelephant
dataelephant

Reputation: 563

Scraping Headlines off of Yahoo! Finance with Python3

I've been trying to scrape the headlines off of Yahoo! Finance's page for individual stocks. For example, I wanted to get the headlines for GOOGL, but I can't seem to get the right CSS selector for BeautifulSoup to scrape. Any ideas? I have tried multiple variations of the code below and substituting my selector with: "a", "href", "#yui_3_9_1_8_1459741486422_44", "li", "ul" etc. I've left my latest iteration with the "a" tag which, I know, gives you all the page's links, not simply the headlines.

import re
import requests
from bs4 import BeautifulSoup

URL = 'http://finance.yahoo.com/q?s=GOOGL'
res = requests.get(URL)
res.raise_for_status()
content = res.content
soup = BeautifulSoup(content, 'html.parser')
print(soup.select('a'))

http://finance.yahoo.com/q/h?s=GOOGL&t=2016-04-03T21:02:10-04:00

This is what I get when I try to copy the selector (I have Chrome, utilizing the built-in Inspector): #yui_3_9_1_8_1459741486422_44. Tried every variation I could think of for this id, nothing has worked.

The API, ystockquote, doesn't have a function which lets you easily get the headlines, I don't think...?

Upvotes: 3

Views: 1066

Answers (1)

alecxe
alecxe

Reputation: 474271

Get the list of headline links from under the div with yfi_quote_headline class:

links = soup.select('div.yfi_quote_headline ul > li > a')
for link in links:
    print(link.get_text(strip=True))

Upvotes: 2

Related Questions