Reputation: 1005
I have a simple table of stock prices, let's call it "price," with tickers along the top and dates along the side.
I want to create a new table, "price_norm," which divides every price by the first one. So for example, the first column might look like:
price price_norm
20.00 1.000
21.00 1.050
21.00 1.050
20.00 1.000
20.50 1.025
...etc
The wrinkle is that some stocks don't have pricing all the way back to the first date, so their column in "price" begins with a string of NAs -- in that case they should be divided by the first non-NA value, not the value in the first row.
There may also be a few stocks with no price at all, where every value is NA; in this case the corresponding column in price_norm should be all NA too.
I have done this in a way that works but very slowly -- 30 seconds or more to process. I'm looking for a more efficient way:
price_norm <- price
nonNAIndex <- function(z) {
min(which(!is.na(z)))
}
for( j in colnames(price) ) {
if(!is.na(price[nrow(price),j])) {
k <- nonNAIndex(price[,j])
for( i in k:nrow(price) ) {
price_norm[i,j] <- ( price[i,j] / price[k,j] )
}
}
}
Upvotes: 2
Views: 75
Reputation: 887891
Or you could try: (Using @Ananda Mahto's dataset)
m1 <- t(mydf)
mydf/as.list(m1[cbind(1:nrow(m1),max.col(!is.na(m1), "first"))])
On a slightly bigger dataset
set.seed(24)
df <- as.data.frame(matrix(sample(c(NA,1:25), 1e3*2e3, replace=TRUE), ncol=2e3))
f1 <- function() df/lapply(df, function(x) ifelse(all(is.na(x)), NA, na.omit(x)[1]))
f2 <- function() sapply(df, function(x) {x/ifelse(all(is.na(x)), NA, head(x[!is.na(x)],1))})
f3 <- function() {m1 <- t(df)
df/as.list(m1[cbind(1:nrow(m1),max.col(!is.na(m1), "first"))])}
library(microbenchmark)
microbenchmark(f1(), f2(), f3(), unit="relative", times=25L)
#Unit: relative
#expr min lq median uq max neval
#f1() 1.213922 1.197350 1.140682 1.103249 1.351307 25
#f2() 6.318962 6.925681 7.212136 6.370824 8.080331 25
#f3() 1.000000 1.000000 1.000000 1.000000 1.000000 25
Upvotes: 0
Reputation: 3188
Make sure your data is a data.frame
sapply(df, function(x) {x/ifelse(all(is.na(x)), NA, head(x[!is.na(x)],1))})
Upvotes: 0
Reputation: 193687
The following sounds like it should do what you want:
## sample data
mydf <- data.frame(v1 = c(20.00, 21.00, 21.00, 20.00, 20.50),
v2 = c(NA, 20.00, 21.00, 21.00, 20.00),
v3 = NA)
mydf / lapply(mydf, function(x) ifelse(all(is.na(x)), NA, na.omit(x)[1]))
# v1 v2 v3
# 1 1.000 NA NA
# 2 1.050 1.00 NA
# 3 1.050 1.05 NA
# 4 1.000 1.05 NA
# 5 1.025 1.00 NA
I've just created a list of the initial values and divided the original data.frame
by those values.
Upvotes: 2