Reputation: 19803
I would like to be able to write a function that runs regressions in a data.table
by groups and then nicely organizes the results. Here is a sample of what I would like to do:
require(data.table)
dtb = data.table(y=1:10, x=10:1, z=sample(1:10), weights=1:10, thedate=1:2)
models = c("y ~ x", "y ~ z")
res = lapply(models, function(f) {dtb[,as.list(coef(lm(f, weights=weights, data=.SD))),by=thedate]})
#do more stuff with res
I would like to wrap all this into a function since the #doe more stuff
might be long. The issue I face is how to pass the various names of things to data.table
? For example, how do I pass the column name weights
? how do I pass thedate
? I envision a prototype that looks like this:
myfun = function(dtb, models, weights, dates)
Let me be clear: passing the formulas to my function is NOT the problem. If the weights
I wanted to use and the column name describing the date, thedate
were known then my function could simply look like this:
myfun = function(dtb, models) {
res = lapply(models, function(f) {dtb[,as.list(coef(lm(f, weights=weights, data=.SD))),by=thedate]})
#do more stuff with res
}
However the column names corresponding to thedate
and to the weights
are unknown in advance. I would like to pass them to my function as so:
#this will not work
myfun = function(dtb, models, w, d) {
res = lapply(models, function(f) {dtb[,as.list(coef(lm(f, weights=w, data=.SD))),by=d]})
#do more stuff with res
}
Thanks
Upvotes: 12
Views: 3456
Reputation: 115382
Here is a solution that relies on having the data in long format (which makes more sense to me, in this cas
library(reshape2)
dtlong <- data.table(melt(dtb, measure.var = c('x','z')))
foo <- function(f, d, by, w ){
# get the name of the w argument (weights)
w.char <- deparse(substitute(w))
# convert `list(a,b)` to `c('a','b')`
# obviously, this would have to change depending on how `by` was defined
by <- unlist(lapply(as.list(as.list(match.call())[['by']])[-1], as.character))
# create the call substituting the names as required
.c <- substitute(as.list(coef(lm(f, data = .SD, weights = w), list(w = as.name(w.char)))))
# actually perform the calculations
d[,eval(.c), by = by]
}
foo(f= y~value, d= dtlong, by = list(variable, thedate), w = weights)
variable thedate (Intercept) value
1: x 1 11.000000 -1.00000000
2: x 2 11.000000 -1.00000000
3: z 1 1.009595 0.89019190
4: z 2 7.538462 -0.03846154
Upvotes: 7
Reputation: 19803
one possible solution:
fun = function(dtb, models, w_col_name, date_name) {
res = lapply(models, function(f) {dtb[,as.list(coef(lm(f, weights=eval(parse(text=w_col_name)), data=.SD))),by=eval(parse(text=paste0("list(",date_name,")")))]})
}
Upvotes: 3
Reputation: 263331
Can't you just add (inside that anonymous function call):
f <- as.formula(f)
... as a separate line before the dtb[,as.list(coef(lm(f, ...)
? That's the usual way of turning a character element into a formula object.
> res = lapply(models, function(f) {f <- as.formula(f)
dtb[,as.list(coef(lm(f, weights=weights, data=.SD))),by=thedate]})
>
> str(res)
List of 2
$ :Classes ‘data.table’ and 'data.frame': 2 obs. of 3 variables:
..$ thedate : int [1:2] 1 2
..$ (Intercept): num [1:2] 11 11
..$ x : num [1:2] -1 -1
..- attr(*, ".internal.selfref")=<externalptr>
$ :Classes ‘data.table’ and 'data.frame': 2 obs. of 3 variables:
..$ thedate : int [1:2] 1 2
..$ (Intercept): num [1:2] 6.27 11.7
..$ z : num [1:2] 0.0633 -0.7995
..- attr(*, ".internal.selfref")=<externalptr>
If you need to build character versions of formulas from component names, just use paste
or paste0
and pass to the models character vector. Tested code supplied with receipt of testable examples.
Upvotes: 0