Reputation: 347
I am trying to make a web crawler to pull some information from Yahoo Finance as a personal Project. However, on the analysis page of Yahoo finance I can't pull a particular value. The HTML seems complicated to me, could I get some guidance?
class yhcrawler(scrapy.Spider):
name = 'yahoo'
start_urls = [f'https://ca.finance.yahoo.com/quote/{t}/analysis?p={t}' for t in tkrs]
def parse(self, response):
filename = 'stock_growths.csv'
l = response.css('div#YDC-Col1>div>div>div>div>div>section>table>tbody>tr>td#431::text').extract()
print(l)
this is what I am trying
l = response.css('div#YDC-Col1>div>div>div>div>div>section>table>tbody>tr>td#431::text').extract()
and I am getting an empty results of
2021-04-18 15:12:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://ca.finance.yahoo.com/quote/M/analysis?p=M> (referer: None)
[]
The value I am trying to get is on the highlighted line, -11.82%
Upvotes: 0
Views: 89
Reputation: 22440
Try this:
class YahoofinanceSpider(scrapy.Spider):
name = 'yahoofinance'
start_urls = ['https://ca.finance.yahoo.com/quote/aapl/analysis?p=aapl']
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
}
def start_requests(self):
for start_url in self.start_urls:
yield scrapy.Request(start_url,headers=self.headers)
def parse(self, response):
item = response.xpath("//td[./span][contains(.,'Next 5 Years')]/following-sibling::td/text()").getall()
yield {"item":item}
Upvotes: 1