Ben
Ben

Reputation: 21705

How do I apply a function to row subsets of a data.table where each call returns a data.table

Here's a data.table

dt <- data.table(group = c("a","a","a","b","b","b"), x = c(1,3,5,1,3,5), y= c(3,5,8,2,8,9))
dt
   group x y
1:     a 1 3
2:     a 3 5
3:     a 5 8
4:     b 1 2
5:     b 3 8
6:     b 5 9

And here's a function that operates on a data.table and returns a data.table

myfunc <- function(dt){
  # Hyman spline interpolation (which preserves monotonicity)

  newdt <- data.table(x = seq(min(dt$x), max(dt$x)))
  newdt$y <- spline(x = dt$x, y = dt$y, xout = newdt$x, method = "hyman")$y
  return(newdt)
}

How do I apply myfunc to each subset of dt defined by the "group" column? In other words, I want an efficient, generalized way to do this

result <- rbind(myfunc(dt[group=="a"]), myfunc(dt[group=="b"]))
result
    x     y
 1: 1 3.000
 2: 2 3.875
 3: 3 5.000
 4: 4 6.375
 5: 5 8.000
 6: 1 2.000
 7: 2 5.688
 8: 3 8.000
 9: 4 8.875
10: 5 9.000

EDIT: I've updated my sample dataset and myfunc because I think it was initially too simplistic and invited work-arounds to the actual problem I'm trying to solve.

Upvotes: 3

Views: 143

Answers (1)

David Arenburg
David Arenburg

Reputation: 92300

The whole idea of data.table is being both memory efficient and fast. Thus we never use $ within the data.table scope (only in very rare situations) and we don't create data.table objects within data.tables environment (currently, even .SD has an overhead).

In your case you can take advantage of data.table's non-standard evaluation capabilities and define your function as follows

myfunc <- function(x, y){
   temp = seq(min(x), max(x))
   y = spline(x = x, y = y, xout = temp, method = "hyman")$y
   list(x = temp, y = y)
}

Then the implementation within the dt scope is straight forward

dt[, myfunc(x, y), by = group]
#     group x      y
#  1:     a 1 3.0000
#  2:     a 2 3.8750
#  3:     a 3 5.0000
#  4:     a 4 6.3750
#  5:     a 5 8.0000
#  6:     b 1 2.0000
#  7:     b 2 5.6875
#  8:     b 3 8.0000
#  9:     b 4 8.8750
# 10:     b 5 9.0000

Upvotes: 7

Related Questions