Peter Lewis
Peter Lewis

Reputation: 1005

faster way to do this table transformation in R

I have a simple table of stock prices, let's call it "price," with tickers along the top and dates along the side.

I want to create a new table, "price_norm," which divides every price by the first one. So for example, the first column might look like:

price     price_norm
20.00     1.000
21.00     1.050
21.00     1.050
20.00     1.000
20.50     1.025

...etc

The wrinkle is that some stocks don't have pricing all the way back to the first date, so their column in "price" begins with a string of NAs -- in that case they should be divided by the first non-NA value, not the value in the first row.

There may also be a few stocks with no price at all, where every value is NA; in this case the corresponding column in price_norm should be all NA too.

I have done this in a way that works but very slowly -- 30 seconds or more to process. I'm looking for a more efficient way:

price_norm <- price

nonNAIndex <- function(z) {
  min(which(!is.na(z)))
}

for( j in colnames(price) ) {

  if(!is.na(price[nrow(price),j])) {  

    k <- nonNAIndex(price[,j])

    for( i in k:nrow(price) ) {
      price_norm[i,j] <- ( price[i,j] / price[k,j] ) 
    }
  }
}

Upvotes: 2

Views: 75

Answers (3)

akrun
akrun

Reputation: 887891

Or you could try: (Using @Ananda Mahto's dataset)

m1 <- t(mydf)
mydf/as.list(m1[cbind(1:nrow(m1),max.col(!is.na(m1), "first"))])

Benchmarks

On a slightly bigger dataset

set.seed(24)
df <- as.data.frame(matrix(sample(c(NA,1:25), 1e3*2e3, replace=TRUE), ncol=2e3))
f1 <- function() df/lapply(df, function(x) ifelse(all(is.na(x)), NA, na.omit(x)[1]))
f2 <- function() sapply(df, function(x) {x/ifelse(all(is.na(x)), NA, head(x[!is.na(x)],1))})
f3 <- function() {m1 <- t(df)
                df/as.list(m1[cbind(1:nrow(m1),max.col(!is.na(m1), "first"))])}

library(microbenchmark)
microbenchmark(f1(), f2(), f3(), unit="relative", times=25L)
#Unit: relative
#expr      min       lq   median       uq      max neval
#f1() 1.213922 1.197350 1.140682 1.103249 1.351307    25
#f2() 6.318962 6.925681 7.212136 6.370824 8.080331    25
#f3() 1.000000 1.000000 1.000000 1.000000 1.000000    25

Upvotes: 0

Vlo
Vlo

Reputation: 3188

Make sure your data is a data.frame

sapply(df, function(x) {x/ifelse(all(is.na(x)), NA, head(x[!is.na(x)],1))})

Upvotes: 0

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193687

The following sounds like it should do what you want:

## sample data
mydf <- data.frame(v1 = c(20.00, 21.00, 21.00, 20.00, 20.50), 
                   v2 = c(NA, 20.00, 21.00, 21.00, 20.00),
                   v3 = NA)

mydf / lapply(mydf, function(x) ifelse(all(is.na(x)), NA, na.omit(x)[1]))
#      v1   v2 v3
# 1 1.000   NA NA
# 2 1.050 1.00 NA
# 3 1.050 1.05 NA
# 4 1.000 1.05 NA
# 5 1.025 1.00 NA

I've just created a list of the initial values and divided the original data.frame by those values.

Upvotes: 2

Related Questions