Reputation: 921
I am running a script that downloads stock prices from Yahoo Finance and then calculates the log of their price changes, the standard deviation for the prior 20 days, multiplies the ith price by the ith standard deviation and then divides the current price change by the price change of the prior day or i-1.
I have for loops everywhere which is the first red flag since there are better ways to do things in R. I am more than happy to use vectorized operations wherever possible.
I am defining an empty data frame in R to contain the information I want. I am reading that this is a bad idea. I do know how big the data frame will be since I know the number of symbols for which I am downloading prices. However, when I initialize the data frame, it simply populates the number of rows specified with 0s and adds the data I want beneath. It is easy to simply fix this by deleting the rows after the fact, but I am sure there is a cleaner way.
The below code creates an error:
1: In `[<-.factor`(`*tmp*`, ri, value = "IBM") :
invalid factor level, NA generated.
If I change the 0s to 1s when initializing SpikeData, it will populate just fine but then I need to remove the blank first row which is done in the 3rd to last line of the script.
The second issue I have is with the column names in the SpikeRank data frame. I want the first column to be titled "Symbols" and the next 5 to be dates. However, I do not seem to be able to mix and match. If I set the first col name to "Symbol," the dates are converted to a serial number. Further down the road with this script, I will be adding other columns that also have characters, so I'd like to be able to mix and match the dates and characters in the column names.
GetVol <- function(Window = 20, end=as.Date(Sys.time()), start = end-37){
library(tseries)
library(zoo)
SpikeRank = data.frame(stringsAsFactors=FALSE, Symbol=character(0),
SDInPriceTerms1=numeric(0), SDInPriceTerms2=numeric(0),
SDInPriceTerms3=numeric(0), SDInPriceTerms4=numeric(0),
SDInPriceTerms5=numeric(0))
SymbolList <- c("AAPL", "IBM")
for (Symbol in SymbolList)
{
ts <- get.hist.quote(instrument=Symbol,
start, end,
quote="Close", provider="yahoo", origin="1970-01-01",
compression="d", retclass="zoo")
df <- data.frame(ts)
df <- data.frame(Date=as.Date(rownames(df)), Close=df$Close)
df$PriorClose <- c(NA, head(df$Close, -1))
df$Return <- log(df$Close/df$PriorClose)
for (i in 1:length(df$Close))
{
if(i < Window+1)
{
df$stddev[i] = NA
df$SDPrice[i] = NA
df$CurrentSpike[i] = NA
}
else
{
df$stddev[i] <- sd(df$Return[(i-Window+1):i], na.rm = TRUE)
df$SDPrice[i] <- df$stddev[i] * df$Close[i]
df$CurrentSpike[i] <- (df$Close[i] - df$PriorClose[i])/df$SDPrice[i-1]
}
}#end for
df <- na.omit(df)
SpikeRank <- rbind(SpikeRank, c(Symbol, df$CurrentSpike))
}#end for loop
SpikeRank <- SpikeRank[-1,]
colnames(SpikeRank) <- c(as.Date(1), df$Date)
print(SpikeRank)
)
The print of SpikeRank assuming that the data frame is initialized with 1s instead of 0s is below. The values in the rows are all correct. I am not having any issue with the calculations.
1970-01-02 2014-03-03 2014-03-04 2014-03-05 2014-03-06
2 AAPL 0.268505943103897 0.613087867831883 0.195414096323545 -0.289567687725125
3 IBM -0.600147413085885 1.43686920161242 0.592564096496001 0.426680866502713
2014-03-07
2 -0.055981998552076
3 0.0344269384370513
Here is df for IBM in case that is helpful:
Date Close PriorClose Return stddev SDPrice CurrentSpike
1 2014-01-30 177.36 NA NA NA NA NA
2 2014-01-31 176.68 177.36 -0.0038413786 NA NA NA
3 2014-02-03 172.90 176.68 -0.0216267940 NA NA NA
4 2014-02-04 172.84 172.90 -0.0003470816 NA NA NA
5 2014-02-05 174.24 172.84 0.0080673481 NA NA NA
6 2014-02-06 174.67 174.24 0.0024648203 NA NA NA
7 2014-02-07 177.25 174.67 0.0146626860 NA NA NA
8 2014-02-10 177.14 177.25 -0.0006207850 NA NA NA
9 2014-02-11 179.70 177.14 0.0143484134 NA NA NA
10 2014-02-12 180.24 179.70 0.0030005023 NA NA NA
11 2014-02-13 181.84 180.24 0.0088378834 NA NA NA
12 2014-02-14 183.69 181.84 0.0101223746 NA NA NA
13 2014-02-18 183.19 183.69 -0.0027256886 NA NA NA
14 2014-02-19 182.95 183.19 -0.0013109741 NA NA NA
15 2014-02-20 184.26 182.95 0.0071349122 NA NA NA
16 2014-02-21 182.79 184.26 -0.0080098508 NA NA NA
17 2014-02-24 183.45 182.79 0.0036041979 NA NA NA
18 2014-02-25 183.23 183.45 -0.0011999565 NA NA NA
19 2014-02-26 184.06 183.23 0.0045195971 NA NA NA
20 2014-02-27 185.27 184.06 0.0065524292 NA NA NA
21 2014-02-28 185.17 185.27 -0.0005398985 0.008188660 1.516294 NA
22 2014-03-03 184.26 185.17 -0.0049265184 0.008233949 1.517188 -0.60014741
23 2014-03-04 186.44 184.26 0.0117616678 0.006336123 1.181307 1.43686920
24 2014-03-05 187.14 186.44 0.0037475283 0.006261815 1.171836 0.59256410
25 2014-03-06 187.64 187.14 0.0026682336 0.006192074 1.161881 0.42668087
26 2014-03-07 187.68 187.64 0.0002131514 0.006236115 1.170394 0.03442694
Thank you for your help.
Upvotes: 1
Views: 185
Reputation: 59395
First of all, your code does not run as presented. This is because you don't bother to define start
, end
, or Window
. So I had to infer those values from your df
for IBM. (BTW: This is probably why no one else has bothered to respond. The expectation on SO is that, if you want help, you will provide a working example).
So this is a shorter way to produce what you have. Note the use of rollapply(...)
and head(...)
to avoid the inner loop, and the use of do.call(rbind,lapply(...))
to avoid the outer loop and the pre-allocation of SpikeRank
. Your insistence on having dates as column names created a lot of problems, because default behavior in most functions that create data frames is to avoid column names that begin with a number.
library(tseries) # for get.hist.quote
library(zoo) # for rollapply
start <- "2014-01-30"
end <- "2014-03-07"
Window <- 20
SymbolList <- c("AAPL","IBM")
get.SpikeRank <- function(Symbol,start, end, Window) {
ts <- get.hist.quote(instrument=Symbol,
start, end,
quote="Close", provider="yahoo", origin="1970-01-01",
compression="d", retclass="zoo")
df <- data.frame(ts)
df <- data.frame(Date=as.Date(rownames(df)), Close=df$Close)
df$PriorClose <- c(NA, head(df$Close, -1))
df$Return <- log(df$Close/df$PriorClose)
df$stdev <- c(rep(NA,Window),rollapply(df$Return[-1],width=Window,sd,na.rm=T))
df$SDPrice <- df$stdev * df$Close
df$CurrentSpike <- (df$Close - df$PriorClose)/c(NA,head(df$SDPrice,-1))
df <- na.omit(df)
row <- df$CurrentSpike
names(row) <- df$Date
return(row)
}
SpikeRank <- do.call(rbind,lapply(SymbolList,get.SpikeRank,start,end,Window))
SpikeRank <- data.frame(Symbol=SymbolList, SpikeRank)
colnames(SpikeRank)[-1] <- substring(colnames(SpikeRank)[-1],2)
print(SpikeRank)
# Symbol 2014.03.03 2014.03.04 2014.03.05 2014.03.06 2014.03.07
# 1 AAPL 0.2685059 0.6130879 0.1954141 -0.2895677 -0.05598200
# 2 IBM -0.6001474 1.4368692 0.5925641 0.4266809 0.03442694
Upvotes: 1