Reputation: 1
I am trying to webscrape yahoo finance data. I have found a solution that works for some data ... but I can't figure out how to make the leap to quarterly data. I am wondering if I am on the wrong path. Here's a solution that worked for me but I can't figure out how to make the leap to quarterly data instead of annual data: R: web scraping yahoo.finance after 2019 change
Upvotes: 0
Views: 647
Reputation: 2243
You can also use the R package RSelenium to change to quaterly data :
library(rvest)
library(stringr)
library(magrittr)
library(RSelenium)
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate("https://finance.yahoo.com/quote/AAPL/financials?p=AAPL")
web_Obj_Quaterly <- remDr$findElement("xpath", '//*[@id="Col1-1-Financials-Proxy"]/section/div[1]/div[2]/button/div/span')
web_Obj_Quaterly$clickElement()
page_Content <- remDr$getPageSource()[[1]]
page <- read_html(page_Content)
nodes <- page %>% html_nodes(".fi-row")
df <- NULL
for(i in nodes)
{
r <- list(i %>% html_nodes("[title],[data-test='fin-col']") %>% html_text())
df <- rbind(df,as.data.frame(matrix(r[[1]], ncol = length(r[[1]]), byrow = TRUE), stringsAsFactors = FALSE))
}
matches <- str_match_all(page %>% html_node('#Col1-3-Financials-Proxy') %>% html_text(),'\\d{1,2}/\\d{1,2}/\\d{4}')
headers <- c('Breakdown','TTM', matches[[1]][,1])
names(df) <- headers
View(df)
This answer relies on : R: web scraping yahoo.finance after 2019 change
Upvotes: 0
Reputation: 764
One of the problems with scraping that page is that it defaults to Annual data. The quarterly data is loaded within the browser after a user clicks the "Quarterly" button. While that's bad for scraping, it's good for intercepting API requests. If you open your developer's console in a browser, go to the Network tab, and then select the "Quarterly" button, you'll see a request made (I put the URL down at the bottom as it's really long). The request will return JSON data.
Disclaimer: I do not know a lot about R. But, after doing a little bit of research, I found that R has a couple of packages that allow you to read JSON data, and you can do something like this:
# using rjson
url = "<get from down below>"
data = rjson::readJSON(file=url)
# using jsonlite
library(jsonlite)
url = "<get from down below>"
data <- readJSON(url)
Here's the URL:
There's another URL you can use that can get you quarterly income statement data but seems to be a little bit erratic when using companies outside of the U.S:
Upvotes: 1