Reputation: 6151
I have this piece of code that I'd like to wrap in a function
indata <- data.frame(id = c(1L, 2L, 3L, 4L, 12L, 13L, 14L, 15L),
fid = c(NA, 9L, 1L, 1L, 7L, 5L, 5L, 5L),
mid = c(0L, NA, 2L, 2L, 6L, 6L, 6L, 8L))
library(data.table)
DT <- as.data.table(indata)
DT[, msib:=.(list(id)), by = mid][
,msibs := mapply(setdiff, msib, id)][
,fsib := .(list(id)), by = fid][
,fsibs := mapply(setdiff, fsib, id)][
,siblist := mapply(union, msibs, fsibs)][
,c("msib","msibs", "fsib", "fsibs") := NULL]
So far so good. Works as desired. Now it should be wrapped in a function, where I can pass alternative variable names (without quoting if possible), and here's my first try.
f <- function(DT, id, fid, mid) {
DT[, msib:=.(list(id)), by = mid][
,msibs := mapply(setdiff, msib, id)][
,fsib := .(list(id)), by = fid][
,fsibs := mapply(setdiff, fsib, id)][
,siblist := mapply(union, msibs, fsibs)][
,c("msib","msibs", "fsib", "fsibs") := NULL]
}
I know this isn't working but lets look at the error it throws
indata2 <- indata
names(indata2) <- c("A", "B", "C") # Give new names
DT2 <- as.data.table(indata2)
f(DT2, A, B, C)
Error in as.vector(x, "list") : cannot coerce type 'closure' to vector of type 'list'
That makes sense. Now to make sure that the promises are evaluated correctly I tried this
f <- function(DT, id, fid, mid) {
mid <- deparse(substitute(mid))
id <- deparse(substitute(id))
fid <- deparse(substitute(fid))
DT[, msib:=.(list(id)), by = mid][
,msibs := mapply(setdiff, msib, id)][
,fsib := .(list(id)), by = fid][
,fsibs := mapply(setdiff, fsib, id)][
,siblist := mapply(union, msibs, fsibs)][
,c("msib","msibs", "fsib", "fsibs") := NULL]
}
That doesn't throw an error but also does not work. The output looks like this
f(DT2, A, B, C)
A B C siblist
1: 1 NA 0
2: 2 9 NA
3: 3 1 2
4: 4 1 2
5: 12 7 6
6: 13 5 6
7: 14 5 6
8: 15 5 8
and the siblist
column is empty which it shouldn't be and isn't when I run it manually. I also tried this version (converting it to character strings) to see if that worked:
f <- function(DT, id, fid, mid){
mid <- as.character(substitute(mid))
id <- as.character(substitute(id))
fid <- as.character(substitute(fid))
DT[, msib:=.(list(id)), by = mid][ # Siblings through the mother
,msibs := mapply(setdiff, msib, id)][
,fsib := .(list(id)), by = fid][
,fsibs := mapply(setdiff, fsib, id)][
,siblist := mapply(union, msibs, fsibs)][
,c("msib","msibs", "fsib", "fsibs") := NULL] # Removed unused
}
but that doesn't work either - same output as above. I think it may be because the promises in the j
part of the data.table
are evaluated in the wrong environment but am not sure. How can I fix my function?
Upvotes: 1
Views: 1094
Reputation: 5263
If you expect an object to have a certain structure or hold certain data, then defining a class can really help. And with S3, it's simple.
as.relationship <- function(DT, id, fid, mid) {
out <- DT[, c(id, fid, mid), with = FALSE]
setnames(out, c("id", "fid", "mid"))
setattr(out, "class", c("relationship", class(out)))
out
}
Then you can write a function to work on that class with the safety of knowing where everything is.
f <- function(DT, id, fid, mid) {
relatives <- as.relationship(DT, id, fid, mid)
relatives[
relatives,
on = "fid",
allow.cartesian = TRUE
][
relatives,
on = "mid",
allow.cartesian = TRUE
][
,
{
siblings <- union(i.id, i.id.1)
except_self <- setdiff(siblings, .BY[["id"]])
list(siblist = list(except_self))
},
by = "id"
]
}
This function takes the column names as strings. So you'd call it like this:
f(DT, "id", "fid", "mid")
# id siblist
# 1: 1
# 2: 2
# 3: 3 4
# 4: 4 3
# 5: 12 13,14
# 6: 13 14,15,12
# 7: 14 13,15,12
# 8: 15 13,14
setnames(DT, c("A", "B", "C"))
f(DT, "A", "B", "C")
# id siblist
# 1: 1
# 2: 2
# 3: 3 4
# 4: 4 3
# 5: 12 13,14
# 6: 13 14,15,12
# 7: 14 13,15,12
# 8: 15 13,14
If you're worried about performance, don't be. If you create a data.table
from entire columns of another data.table
, they're smart enough not to actually copy the data. They share it. So there's no real performance penalty to making another object.
Upvotes: 2
Reputation: 12819
This is getting ugly but it seems to work. With lots of get()
s:
f <- function(DT, id, fid, mid) {
mid <- deparse(substitute(mid))
id <- deparse(substitute(id))
fid <- deparse(substitute(fid))
DT[, msib:=.(list(get(id))), by = get(mid)][
,msibs := mapply(setdiff, msib, get(id))][
,fsib := .(list(get(id))), by = get(fid)][
,fsibs := mapply(setdiff, fsib, get(id))][
,siblist := mapply(union, msibs, fsibs)][
,c("msib","msibs", "fsib", "fsibs") := NULL]
}
DT2 <- as.data.table(indata2)
f(DT2, A, B, C)
all.equal(DT, DT2)
# [1] "Different column names"
Upvotes: 0