newbie
newbie

Reputation: 907

How to apply function to each row of data.table

I am trying to calculate net present value (NPV) using library(financial) for each observation from given cashflow in data.table format. Here is my cashflow:

library(data.table)    
dt <- data.table(id=c(1,2,3,4), Year1=c(NA, 30, 40, NA), Year2=c(20, 30, 20 ,70), Year3=c(60, 40, 0, 10))

To calculate NPV and update in data.table,

library(financial)
npv <- apply(dt, 1, function(x) cf(na.omit(x[-1]), i = 20)$tab[, 'NPV'])
dt[, NPV:=npv]

return,

   id Year1 Year2 Year3      NPV
1:  1    NA    20    60 70.00000
2:  2    30    30    40 82.77778
3:  3    40    20     0 56.66667
4:  4    NA    70    10 78.33333

How can I directly update result using function cf to each row in the data.table ?

FYI: In my real dataset, there are over 50 columns

Upvotes: 2

Views: 2266

Answers (3)

dmi3kno
dmi3kno

Reputation: 3055

We could try and make our own npv function for using in this example.

dcf <- function(x, r, t0=FALSE){
  # calculates discounted cash flows (DCF) given cash flow and discount rate
  #
  # x - cash flows vector
  # r - vector or discount rates, in decimals. Single values will be recycled
  # t0 - cash flow starts in year 0, default is FALSE, i.e. discount rate in first period is zero.
  if(length(r)==1){
    r <- rep(r, length(x))
    if(t0==TRUE){r[1]<-0}
  }
  x/cumprod(1+r)
}

npv <- function(x, r, t0=FALSE){
  # calculates net present value (NPV) given cash flow and discount rate
  #
  # x - cash flows vector
  # r - discount rate, in decimals
  # t0 - cash flow starts in year 0, default is FALSE
  sum(dcf(x, r, t0))
}

Now, whenever you want to apply(x,1,f), melt/gather/nest instead. Unless you are intending to totally mislead the users of your data, you should never drop NA when calculating NPV. This will mean that you are discounting cash flows to different points in time. Replace NA with 0 instead. I also see that the package you intended to use, discounts cash flows to year 0, basically meaning that the first cash flow (in Year1) is not discounted.

library(data.table)
npv_dt <- melt(dt, id.vars = "id")[is.na(value), value:=0][order(variable), .(NPV=npv(x=value, r=0.2, t0=TRUE)), by="id"]

setkey(dt, id)
setkey(npv_dt, id)

npv_dt[dt]

#>    id      NPV Year1 Year2 Year3
#> 1:  1 58.33333    NA    20    60
#> 2:  2 82.77778    30    30    40
#> 3:  3 56.66667    40    20     0
#> 4:  4 65.27778    NA    70    10

Upvotes: 0

thelatemail
thelatemail

Reputation: 93938

Rewriting the cf function to only calculate the part that is needed will speed things up dramatically:

dt[, NPV := {x <- na.omit(unlist(.SD)); sum(x * sppv(20,0:(length(x)-1)))}, by=id]

#   id Year1 Year2 Year3      NPV
#1:  1    NA    20    60 70.00000
#2:  2    30    30    40 82.77778
#3:  3    40    20     0 56.66667
#4:  4    NA    70    10 78.33333

In fact, this could probably be vectorised now.... hmmm, let me think!

Upvotes: 0

akrun
akrun

Reputation: 887841

We can try a join based approach

dt[melt(dt, id.var = "id")[, .(NPV = cf(value[!is.na(value)], 
                      i = 20)$tab[, "NPV"]), id], on = 'id']
#   id Year1 Year2 Year3      NPV
#1:  1    NA    20    60 70.00000
#2:  2    30    30    40 82.77778
#3:  3    40    20     0 56.66667
#4:  4    NA    70    10 78.33333

Upvotes: 1

Related Questions