How to apply self-defined function on the result of group_by

Question

I'd like to group by the data by some column then replace NA with most recent observation. Is there any way to apply a function other than aggregation function to the result of group_by?

Here is the two sample implemented with ddply:

1:

dt<-data.table(A=rep(c(1:3),2), B=c(1,2,NA,NA,2,5),C=c(9,NA,2,8,NA,4)
ddply(dt,"A",function(x){na.locf(x, na.rm = FALSE,fromLast=FALSE)})

2:

ddply(dt,"A",function(x){
 if (x[1,"A"]>2){
  x[,2:3]*1
 } else {
  x[,2:3]*(-1)
}

})

I don't know how to replicate it with groug_by which should be faster than ddply. By the way, is there any NA replacement function quicker than na.locf?

Many thanks in advance.

David Arenburg · Accepted Answer

Here's how you would do this with dplyr

dt %>%
   group_by(A) %>%
   mutate_each(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))

But if you already using data.table, why not just use it?

dt[, lapply(.SD, na.locf, na.rm = FALSE, fromLast = FALSE), by = A]

You could also update the data table by reference using := operator as in

dt[, names(dt)[-1] := lapply(.SD, na.locf, na.rm = FALSE, fromLast = FALSE), A]

How to apply self-defined function on the result of group_by

Answers (1)

Related Questions