Reputation: 3269
I created the code below that works great in getting some data (stock prices and volumes) from Yahoo Finance for a particular company. My challenge is that I got only 100 observations. I would like to get all the data that is in there.
How should I adjust the code to achieve that? Thanks.
library(rvest)
library(tidyverse)
x <- 'SAP'
wb <- paste0('https://finance.yahoo.com/quote/', x, '/history?p=', x)
browseURL(wb)
read_html(wb) %>%
html_node(css = "table") %>%
html_table %>%
as_tibble -> df
Upvotes: 2
Views: 1219
Reputation: 174468
There is actually a link to a downloadable csv on the page. You can specify the company and time period etc, in R prior to downloading it.
The only problem is that the url needs to include a parameter called "crumbs" which is effectively your session ID, and you'll need to harvest this from the page to construct the url.
I have written a function to do all this. All you need to do is supply the company name and it will give you all financial records in the last 15 years or so as a data frame.
get_financial_data <- function(company)
{
tmp <- tempfile()
yahoo <- httr::handle('https://finance.yahoo.com/quote/')
'https://finance.yahoo.com/quote/' %>%
paste0(company) %>%
paste0('/history?p=') %>%
paste0(company) %>%
httr::GET(handle = yahoo) %>%
httr::content("text") -> raw_html
strsplit(raw_html, "crumb\":\"") %>%
unlist %>%
strsplit("\"") %>%
lapply(`[`, 1) %>%
unlist %>%
table %>%
sort %>%
rev %>%
names %>%
`[`(2) %>%
paste0("https://query1.finance.yahoo.com/v7/finance/download/", company,
"?period1=1079873500&period2=1579873500&interval=1d&events=history&crumb=", .) %>%
httr::GET(handle = yahoo) %>%
httr::content("text", encoding = "UTF-8") %>%
writeBin(tmp, useBytes = T)
suppressWarnings(read.csv(tmp) %>% tibble::as_tibble())
}
Now you can just do:
get_financial_data("SAP")
#> # A tibble: 3,988 x 7
#> Date Open High Low Close Adj.Close Volume
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 2004-03-22 37.4 37.5 36.7 37.3 29.3 1717600
#> 2 2004-03-23 37.8 37.9 37.4 37.5 29.5 1417500
#> 3 2004-03-24 37.4 38.1 37.2 37.5 29.5 1682500
#> 4 2004-03-25 37.9 38.9 37.9 38.8 30.4 2233800
#> 5 2004-03-26 37.9 38.2 37.5 37.9 29.8 3343500
#> 6 2004-03-29 38.6 39.3 38.6 38.8 30.5 1886800
#> 7 2004-03-30 38.8 39.3 38.8 39.1 30.7 997900
#> 8 2004-03-31 39.5 39.8 39.3 39.3 30.9 1818600
#> 9 2004-04-01 39.8 40.6 39.8 40.6 31.9 2570600
#> 10 2004-04-02 40.9 41.3 40.6 41.1 32.3 1708200
#> # ... with 3,978 more rows
Upvotes: 4