AlexB
AlexB

Reputation: 3269

Extracting data from Yahoo Finance

I created the code below that works great in getting some data (stock prices and volumes) from Yahoo Finance for a particular company. My challenge is that I got only 100 observations. I would like to get all the data that is in there.

How should I adjust the code to achieve that? Thanks.

library(rvest)
library(tidyverse)

x <- 'SAP'
wb <- paste0('https://finance.yahoo.com/quote/', x, '/history?p=', x)

browseURL(wb)

read_html(wb) %>%
  html_node(css =  "table") %>%
  html_table %>%
  as_tibble -> df

Upvotes: 2

Views: 1219

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 174468

There is actually a link to a downloadable csv on the page. You can specify the company and time period etc, in R prior to downloading it.

The only problem is that the url needs to include a parameter called "crumbs" which is effectively your session ID, and you'll need to harvest this from the page to construct the url.

I have written a function to do all this. All you need to do is supply the company name and it will give you all financial records in the last 15 years or so as a data frame.

  get_financial_data <- function(company)
  {
    tmp <- tempfile()
    yahoo <- httr::handle('https://finance.yahoo.com/quote/')

    'https://finance.yahoo.com/quote/' %>%
    paste0(company)                    %>%
    paste0('/history?p=')              %>%
    paste0(company)                    %>%
    httr::GET(handle = yahoo)          %>%
    httr::content("text")               -> raw_html

    strsplit(raw_html, "crumb\":\"")   %>%
    unlist                             %>%
    strsplit("\"")                     %>%
    lapply(`[`, 1)                     %>%
    unlist                             %>%
    table                              %>%
    sort                               %>%
    rev                                %>%
    names                              %>%
    `[`(2)                             %>% 
    paste0("https://query1.finance.yahoo.com/v7/finance/download/", company, 
          "?period1=1079873500&period2=1579873500&interval=1d&events=history&crumb=", .) %>%
    httr::GET(handle = yahoo) %>%
    httr::content("text", encoding = "UTF-8") %>%
    writeBin(tmp, useBytes = T)

    suppressWarnings(read.csv(tmp) %>% tibble::as_tibble())
  }

Now you can just do:

get_financial_data("SAP")
#> # A tibble: 3,988 x 7
#>    Date        Open  High   Low Close Adj.Close  Volume
#>    <fct>      <dbl> <dbl> <dbl> <dbl>     <dbl>   <int>
#>  1 2004-03-22  37.4  37.5  36.7  37.3      29.3 1717600
#>  2 2004-03-23  37.8  37.9  37.4  37.5      29.5 1417500
#>  3 2004-03-24  37.4  38.1  37.2  37.5      29.5 1682500
#>  4 2004-03-25  37.9  38.9  37.9  38.8      30.4 2233800
#>  5 2004-03-26  37.9  38.2  37.5  37.9      29.8 3343500
#>  6 2004-03-29  38.6  39.3  38.6  38.8      30.5 1886800
#>  7 2004-03-30  38.8  39.3  38.8  39.1      30.7  997900
#>  8 2004-03-31  39.5  39.8  39.3  39.3      30.9 1818600
#>  9 2004-04-01  39.8  40.6  39.8  40.6      31.9 2570600
#> 10 2004-04-02  40.9  41.3  40.6  41.1      32.3 1708200
#> # ... with 3,978 more rows

Upvotes: 4

Related Questions