ekstroem
ekstroem

Reputation: 6151

Evaluating function arguments to pass to data.table

I have this piece of code that I'd like to wrap in a function

indata <- data.frame(id = c(1L, 2L, 3L, 4L, 12L, 13L, 14L, 15L), 
                     fid = c(NA, 9L, 1L, 1L, 7L, 5L, 5L, 5L), 
                     mid = c(0L, NA, 2L, 2L, 6L, 6L, 6L, 8L))
library(data.table)
DT <- as.data.table(indata)

DT[, msib:=.(list(id)), by = mid][                                              
   ,msibs := mapply(setdiff, msib, id)][
   ,fsib  := .(list(id)), by = fid][
   ,fsibs := mapply(setdiff, fsib, id)][
   ,siblist  := mapply(union, msibs, fsibs)][
   ,c("msib","msibs", "fsib", "fsibs") := NULL] 

So far so good. Works as desired. Now it should be wrapped in a function, where I can pass alternative variable names (without quoting if possible), and here's my first try.

f <- function(DT, id, fid, mid) {

    DT[, msib:=.(list(id)), by = mid][                                              
       ,msibs := mapply(setdiff, msib, id)][
       ,fsib  := .(list(id)), by = fid][
       ,fsibs := mapply(setdiff, fsib, id)][
       ,siblist  := mapply(union, msibs, fsibs)][
       ,c("msib","msibs", "fsib", "fsibs") := NULL] 
}

I know this isn't working but lets look at the error it throws

indata2 <- indata
names(indata2) <- c("A", "B", "C")  # Give new names
DT2 <- as.data.table(indata2)
f(DT2, A, B, C)

Error in as.vector(x, "list") : cannot coerce type 'closure' to vector of type 'list'

That makes sense. Now to make sure that the promises are evaluated correctly I tried this

f <- function(DT, id, fid, mid) {
    mid <- deparse(substitute(mid))
    id <- deparse(substitute(id))
    fid <- deparse(substitute(fid))

    DT[, msib:=.(list(id)), by = mid][                                              
       ,msibs := mapply(setdiff, msib, id)][
       ,fsib  := .(list(id)), by = fid][
       ,fsibs := mapply(setdiff, fsib, id)][
       ,siblist  := mapply(union, msibs, fsibs)][
       ,c("msib","msibs", "fsib", "fsibs") := NULL] 
}

That doesn't throw an error but also does not work. The output looks like this

f(DT2, A, B, C)
    A  B  C siblist
1:  1 NA  0        
2:  2  9 NA        
3:  3  1  2        
4:  4  1  2        
5: 12  7  6        
6: 13  5  6        
7: 14  5  6        
8: 15  5  8   

and the siblist column is empty which it shouldn't be and isn't when I run it manually. I also tried this version (converting it to character strings) to see if that worked:

f <- function(DT, id, fid, mid){
    mid <- as.character(substitute(mid))
    id <- as.character(substitute(id))
    fid <- as.character(substitute(fid))
    DT[, msib:=.(list(id)), by = mid][ # Siblings through the mother
       ,msibs := mapply(setdiff, msib, id)][
       ,fsib  := .(list(id)), by = fid][
       ,fsibs := mapply(setdiff, fsib, id)][
       ,siblist  := mapply(union, msibs, fsibs)][
       ,c("msib","msibs", "fsib", "fsibs") := NULL] # Removed unused
}

but that doesn't work either - same output as above. I think it may be because the promises in the j part of the data.table are evaluated in the wrong environment but am not sure. How can I fix my function?

Upvotes: 1

Views: 1094

Answers (2)

Nathan Werth
Nathan Werth

Reputation: 5263

If you expect an object to have a certain structure or hold certain data, then defining a class can really help. And with S3, it's simple.

as.relationship <- function(DT, id, fid, mid) {
  out <- DT[, c(id, fid, mid), with = FALSE]
  setnames(out, c("id", "fid", "mid"))
  setattr(out, "class", c("relationship", class(out)))
  out
}

Then you can write a function to work on that class with the safety of knowing where everything is.

f <- function(DT, id, fid, mid) {
  relatives <- as.relationship(DT, id, fid, mid)
  relatives[
    relatives,
    on = "fid",
    allow.cartesian = TRUE
  ][
    relatives,
    on = "mid",
    allow.cartesian = TRUE
  ][
    ,
    {
      siblings    <- union(i.id, i.id.1)
      except_self <- setdiff(siblings, .BY[["id"]])
      list(siblist = list(except_self))
    },
    by = "id"
  ]
}

This function takes the column names as strings. So you'd call it like this:

f(DT, "id", "fid", "mid")
#    id  siblist
# 1:  1         
# 2:  2         
# 3:  3        4
# 4:  4        3
# 5: 12    13,14
# 6: 13 14,15,12
# 7: 14 13,15,12
# 8: 15    13,14

setnames(DT, c("A", "B", "C"))
f(DT, "A", "B", "C")
#    id  siblist
# 1:  1         
# 2:  2         
# 3:  3        4
# 4:  4        3
# 5: 12    13,14
# 6: 13 14,15,12
# 7: 14 13,15,12
# 8: 15    13,14

If you're worried about performance, don't be. If you create a data.table from entire columns of another data.table, they're smart enough not to actually copy the data. They share it. So there's no real performance penalty to making another object.

Upvotes: 2

Aur&#232;le
Aur&#232;le

Reputation: 12819

This is getting ugly but it seems to work. With lots of get()s:

f <- function(DT, id, fid, mid) {
  mid <- deparse(substitute(mid))
  id <- deparse(substitute(id))
  fid <- deparse(substitute(fid))

  DT[, msib:=.(list(get(id))), by = get(mid)][                                              
    ,msibs := mapply(setdiff, msib, get(id))][
      ,fsib  := .(list(get(id))), by = get(fid)][
        ,fsibs := mapply(setdiff, fsib, get(id))][
          ,siblist  := mapply(union, msibs, fsibs)][
            ,c("msib","msibs", "fsib", "fsibs") := NULL] 
}

DT2 <- as.data.table(indata2)
f(DT2, A, B, C)

all.equal(DT, DT2)
# [1] "Different column names"

Upvotes: 0

Related Questions