js80
js80

Reputation: 385

Webscraping with Yahoo Finance

I've seen a few strings for webscraping since some of the page adjustments were made to Yahoo Finance, and the following script works well for one ticker but creating a loop that repeats it for many tickers and then binds them up into one large data frame with the corresponding ticker for each row has resulted in the following message:

Error in open.connection(x, "rb") : HTTP error 503.

Here is the script with the loop - "tickers":

library(quantmod)
symbolData2 <- stockSymbols(exchange="NASDAQ")
symbolData3 <- stockSymbols(exchange="NYSE")
complete_symbols <- rbind(symbolData2,symbolData3)  
tickers <- paste(complete_symbols$Symbol,sep=',')
stocks <- tickers

for (s in stocks) {
  url <- paste0("https://finance.yahoo.com/quote/",s,"/key-statistics?p=", s)
  df <- url %>% 
  read_html() %>% 
  html_table(header = FALSE) %>% 
  map_df(bind_cols) %>% 
  as_tibble()

  assign(s, df)

  df <- get(s)
  df['stock'] <- s
  assign(s, df)

}  

stockdata <- do.call(rbind, stockdatalist)

stockdata <- stockdata[, c(ncol(stockdata), 1:ncol(stockdata)-1)]

If a particular ticker is hanging this operation up it's hard to pinpoint which one (and I'd rather the script just be able to skip over it). Any help to finalize this is much appreciated.

Upvotes: 1

Views: 684

Answers (1)

phiver
phiver

Reputation: 23598

I rewrote the answer to get only the fundamental data. First of, instead of a loop I put your scrape request into a function. Next I wrote an error catcher function loosely based on possibly function from purrr. This to be able to return a function instead of a default value. Then you can use map_df to loop over all the ticker symbols. Whenever there is an error, the data will be NA but will show the ticker and fill in an error column.

If speed is an issue, you might look into the furrr package to be able to run all of this in parallel.

library(rvest)
library(purrr)
library(dplyr)

get_stats <- function(symbol) {
  url <- paste0("https://finance.yahoo.com/quote/",symbol,"/key-statistics?p=", symbol)
  df <- url %>%
  read_html() %>%
  html_table(header = FALSE) %>%
  map_df(bind_cols) %>%
  as_tibble()

  names(df) <- c("valuation_measures", "value")
  df["stock"] <- symbol

  return(df)
}

catch_error <- function(.f, otherwise=NULL) {
  function(...) {
    tryCatch({
      .f(...)  
    }, error = function(e) otherwise(...))
  }
}

tickers <- c("xxxxxx", "AAPL")

out <- map_df(tickers, catch_error(get_stats, otherwise = function(x) tibble(valuation_measures = NA_character_, value = NA_character_, stock = x, error = "error in getting data")))

# A tibble: 60 x 4
   valuation_measures          value stock  error                
   <chr>                       <chr> <chr>  <chr>                
 1 NA                          NA    xxxxxx error in getting data
 2 Market Cap (intraday) 5     1.22T AAPL   NA                   
 3 Enterprise Value 3          1.23T AAPL   NA                   
 4 Trailing P/E                22.07 AAPL   NA                   
 5 Forward P/E 1               17.81 AAPL   NA                   
 6 PEG Ratio (5 yr expected) 1 1.52  AAPL   NA                   
 7 Price/Sales (ttm)           4.54  AAPL   NA                   
 8 Price/Book (mrq)            13.61 AAPL   NA                   
 9 Enterprise Value/Revenue 3  4.58  AAPL   NA                   
10 Enterprise Value/EBITDA 6   15.69 AAPL   NA                   
# ... with 50 more rows

Upvotes: 2

Related Questions