Reputation: 385
I've seen a few strings for webscraping since some of the page adjustments were made to Yahoo Finance, and the following script works well for one ticker but creating a loop that repeats it for many tickers and then binds them up into one large data frame with the corresponding ticker for each row has resulted in the following message:
Error in open.connection(x, "rb") : HTTP error 503.
Here is the script with the loop - "tickers":
library(quantmod)
symbolData2 <- stockSymbols(exchange="NASDAQ")
symbolData3 <- stockSymbols(exchange="NYSE")
complete_symbols <- rbind(symbolData2,symbolData3)
tickers <- paste(complete_symbols$Symbol,sep=',')
stocks <- tickers
for (s in stocks) {
url <- paste0("https://finance.yahoo.com/quote/",s,"/key-statistics?p=", s)
df <- url %>%
read_html() %>%
html_table(header = FALSE) %>%
map_df(bind_cols) %>%
as_tibble()
assign(s, df)
df <- get(s)
df['stock'] <- s
assign(s, df)
}
stockdata <- do.call(rbind, stockdatalist)
stockdata <- stockdata[, c(ncol(stockdata), 1:ncol(stockdata)-1)]
If a particular ticker is hanging this operation up it's hard to pinpoint which one (and I'd rather the script just be able to skip over it). Any help to finalize this is much appreciated.
Upvotes: 1
Views: 684
Reputation: 23598
I rewrote the answer to get only the fundamental data. First of, instead of a loop I put your scrape request into a function. Next I wrote an error catcher function loosely based on possibly
function from purrr. This to be able to return a function instead of a default value. Then you can use map_df
to loop over all the ticker symbols. Whenever there is an error, the data will be NA
but will show the ticker and fill in an error column.
If speed is an issue, you might look into the furrr package to be able to run all of this in parallel.
library(rvest)
library(purrr)
library(dplyr)
get_stats <- function(symbol) {
url <- paste0("https://finance.yahoo.com/quote/",symbol,"/key-statistics?p=", symbol)
df <- url %>%
read_html() %>%
html_table(header = FALSE) %>%
map_df(bind_cols) %>%
as_tibble()
names(df) <- c("valuation_measures", "value")
df["stock"] <- symbol
return(df)
}
catch_error <- function(.f, otherwise=NULL) {
function(...) {
tryCatch({
.f(...)
}, error = function(e) otherwise(...))
}
}
tickers <- c("xxxxxx", "AAPL")
out <- map_df(tickers, catch_error(get_stats, otherwise = function(x) tibble(valuation_measures = NA_character_, value = NA_character_, stock = x, error = "error in getting data")))
# A tibble: 60 x 4
valuation_measures value stock error
<chr> <chr> <chr> <chr>
1 NA NA xxxxxx error in getting data
2 Market Cap (intraday) 5 1.22T AAPL NA
3 Enterprise Value 3 1.23T AAPL NA
4 Trailing P/E 22.07 AAPL NA
5 Forward P/E 1 17.81 AAPL NA
6 PEG Ratio (5 yr expected) 1 1.52 AAPL NA
7 Price/Sales (ttm) 4.54 AAPL NA
8 Price/Book (mrq) 13.61 AAPL NA
9 Enterprise Value/Revenue 3 4.58 AAPL NA
10 Enterprise Value/EBITDA 6 15.69 AAPL NA
# ... with 50 more rows
Upvotes: 2