Kat
Kat

Reputation: 1

Quarterly Yahoo Finance Data using R

I am trying to webscrape yahoo finance data. I have found a solution that works for some data ... but I can't figure out how to make the leap to quarterly data. I am wondering if I am on the wrong path. Here's a solution that worked for me but I can't figure out how to make the leap to quarterly data instead of annual data: R: web scraping yahoo.finance after 2019 change

Upvotes: 0

Views: 647

Answers (2)

Emmanuel Hamel
Emmanuel Hamel

Reputation: 2243

You can also use the R package RSelenium to change to quaterly data :

library(rvest)
library(stringr)
library(magrittr)
library(RSelenium)
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate("https://finance.yahoo.com/quote/AAPL/financials?p=AAPL")
web_Obj_Quaterly <- remDr$findElement("xpath", '//*[@id="Col1-1-Financials-Proxy"]/section/div[1]/div[2]/button/div/span')
web_Obj_Quaterly$clickElement()

page_Content <- remDr$getPageSource()[[1]]

page <- read_html(page_Content)
nodes <- page %>% html_nodes(".fi-row")
df <- NULL

for(i in nodes)
{
  r <- list(i %>% html_nodes("[title],[data-test='fin-col']") %>% html_text())
  df <- rbind(df,as.data.frame(matrix(r[[1]], ncol = length(r[[1]]), byrow = TRUE), stringsAsFactors = FALSE))
}

matches <- str_match_all(page %>% html_node('#Col1-3-Financials-Proxy') %>% html_text(),'\\d{1,2}/\\d{1,2}/\\d{4}')  
headers <- c('Breakdown','TTM', matches[[1]][,1]) 
names(df) <- headers
View(df)

This answer relies on : R: web scraping yahoo.finance after 2019 change

Upvotes: 0

putty
putty

Reputation: 764

One of the problems with scraping that page is that it defaults to Annual data. The quarterly data is loaded within the browser after a user clicks the "Quarterly" button. While that's bad for scraping, it's good for intercepting API requests. If you open your developer's console in a browser, go to the Network tab, and then select the "Quarterly" button, you'll see a request made (I put the URL down at the bottom as it's really long). The request will return JSON data.

Disclaimer: I do not know a lot about R. But, after doing a little bit of research, I found that R has a couple of packages that allow you to read JSON data, and you can do something like this:

# using rjson
url = "<get from down below>"
data = rjson::readJSON(file=url)

# using jsonlite
library(jsonlite)

url = "<get from down below>"
data <- readJSON(url)

Here's the URL:

https://query1.finance.yahoo.com/ws/fundamentals-timeseries/v1/finance/premium/timeseries/AAPL?lang=en-US&region=US&symbol=AAPL&padTimeSeries=true&type=annualEbitda%2CtrailingEbitda%2CannualDilutedAverageShares%2CtrailingDilutedAverageShares%2CannualBasicAverageShares%2CtrailingBasicAverageShares%2CannualDilutedEPS%2CtrailingDilutedEPS%2CannualBasicEPS%2CtrailingBasicEPS%2CannualNetIncomeCommonStockholders%2CtrailingNetIncomeCommonStockholders%2CannualNetIncome%2CtrailingNetIncome%2CannualNetIncomeContinuousOperations%2CtrailingNetIncomeContinuousOperations%2CannualTaxProvision%2CtrailingTaxProvision%2CannualPretaxIncome%2CtrailingPretaxIncome%2CannualOtherIncomeExpense%2CtrailingOtherIncomeExpense%2CannualInterestExpense%2CtrailingInterestExpense%2CannualOperatingIncome%2CtrailingOperatingIncome%2CannualOperatingExpense%2CtrailingOperatingExpense%2CannualSellingGeneralAndAdministration%2CtrailingSellingGeneralAndAdministration%2CannualResearchAndDevelopment%2CtrailingResearchAndDevelopment%2CannualGrossProfit%2CtrailingGrossProfit%2CannualCostOfRevenue%2CtrailingCostOfRevenue%2CannualTotalRevenue%2CtrailingTotalRevenue&merge=false&period1=493590046&period2=1596836602&corsDomain=finance.yahoo.com

There's another URL you can use that can get you quarterly income statement data but seems to be a little bit erratic when using companies outside of the U.S:

https://query2.finance.yahoo.com/v10/finance/quoteSummary/aapl?modules=incomeStatementHistoryQuarterly

Upvotes: 1

Related Questions