Reputation: 866
Sample data:
pp.inc <- structure(list(has.di.rec.pp = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0), m.dist.km2 = c(-34.4150009155273, 6.80600023269653, -6.55499982833862,
-61.7700004577637, 15.6840000152588, -11.2869997024536, -26.9729995727539,
0, 81.9940032958984, -35.1459999084473, -12.5179996490479, 0,
21.5919990539551, 81.9940032958984, -20.7770004272461, 85.9469985961914,
-15.2959995269775, -75.5879974365234, 81.9940032958984, 3.04999995231628,
-17.1490001678467, -25.806999206543, -16.0060005187988, -14.91100025177,
-12.9020004272461, -16.0060005187988, 5.44000005722046, -34.4150009155273,
81.9940032958984, 3.61400008201599, 13.7379999160767, 2.71300005912781,
4.31300020217896), treated = c(0, 1, 0, 0, 1, 0, 0, 1, 1, 0,
0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1,
1, 1)), .Names = c("has.di.rec.pp", "m.dist.km2", "treated"), row.names = c(NA,
-33L), class = c("data.table", "data.frame"))
Code:
library(data.table)
library(ggplot2)
rddplot <- function(data, outcome, runvar, treatment = treated, span, bw, ...){
data <- data.table(data)
data.span <- data[abs(runvar) <= span, ]
data.span <- data.span[ , bins := cut(runvar,
seq(-span, span, by = bw),
include.lowest = TRUE, right = FALSE)]
data.span.plot <- data.span[ , list(avg.outcome = mean(outcome),
avg.runvar = mean(runvar),
treated = max(treatment),
n.iid = length(outcome)), keyby = bins]
data.span.plot <- data.span.plot[ , runvar := head(seq(-span, span, by = bw), -1)]
bp <- ggplot(data = data.span.plot, aes(x = runvar, y = avg.outcome))
bp <- bp + geom_point(aes(colour = n.iid))
bp <- bp + stat_smooth(data = data.span, aes(x = runvar, y = outcome,
group = factor(treatment)), ...)
bp
return(bp)
}
rddplot(pp.inc, has.di.rec.pp, m.dist.km2, treated, 50, 5)
This code runs perfect if I do not wrap it in a function. I am a novice in R, only using it very infrequently. What am I doing wrong? Am I missing something obvious or is it to do with data.table
or ggplot2
? I thought it might be something with ggplot, as other questions mention there is an issue and aes_string
should be used. I can rewrite the data.table
parts to use base functions. But I think the error already occurs before that, on the second line. How do I make this work?
EDIT:
[Original title: R function returns Error in eval(expr, envir, enclos) : object 'name' not found]
I had some time to look at this again and have worked out a solution, hence I also modified the title a bit. Using eval()
didn't really work out for me, so I went the [['columname']]
selection route. I've ditched data.table
(and plyr
as well), so that this only uses base
functions except for ggplot2. I am happy for any comments on how to improve it. Please let me know if there are some essential flaws. If not I will add an answer with my solution later.
I have changed the bin calculation so that there is always a breakpoint at zero, which is necessary. Default binwidth is determined by the Silverman rule. I am thinking of calculating model fit separately and returning it, as the model choice within ggplot is limited, however I can't think of a nice way to incorporate this for a variety of diverse models such as lm or loess, and it's not strictly necessary. I actually wanted to overlay a thin bar plot displaying the number of observations in each bin, but found out this is impossible in ggplot (I know this generally is a bad idea, but there are several well-published papers which use similar graphs). I don't find the size
aestetic to appealing here, but these are really minor gripes.
Thanks for getting me on the right path.
My solution:
rddplot <- function(data, outcome, runvar, treatment = treated,
span, bw = bw.nrd0(data[[runvar]]), ...){
breaks <- c(sort(-seq(0, span, by = bw)[-1]), seq(0, span, by = bw))
data.span <- data[abs(data[[runvar]]) <= max(breaks), ]
data.span$bins <- cut(data.span[[runvar]], breaks,
include.lowest = TRUE, right = FALSE)
data.span.plot <- as.data.frame(cbind(tapply(data.span[[outcome]], data.span$bins, mean),
tapply(data.span[[runvar]], data.span$bins, mean),
tapply(data.span[[treatment]], data.span$bins, max),
tapply(data.span[[outcome]], data.span$bins, length),
tapply(data.span[[outcome]], data.span$bins, sum)))
colnames(data.span.plot) <- c("avg.outcome", "avg.runvar", "treated", "n.iid", "n.rec")
data.span.plot$runvar <- head(breaks, -1)
print(data.span.plot)
bp <- ggplot(data = data.span.plot, aes(x = runvar, y = avg.outcome))
bp <- bp + geom_point(aes(size = n.iid))
bp <- bp + stat_smooth(data = data.span, aes_string(x = runvar, y = outcome,
group = treatment), ...)
print(bp)
}
Call:
rddplot(pp.inc, "has.di.rec.pp", "m.dist.km2", "treated", 50,
method = lm, formula = y ~ poly(x, 4, raw = TRUE))
Upvotes: 1
Views: 917
Reputation: 115435
I have an approach using data.table
and some deparse(substitute())
and setnames
trickery....
rddplot <- function(data, outcome, runvar, treatment = treated, span, bw, ...){
# convert to data.table
data <- data.table(data)
# get the column names as defined in the call to rddplot
outname <- deparse(substitute(outcome))
runname <- deparse(substitute(runvar))
treatname <- deparse(substitute(treatment))
# rename these columns with the argument namses
setnames(data, old = c(outname,runname,treatname), new = c('outcome','runvar', 'treatment'))
# breaks as defined in the second example
breaks <- c(sort(-seq(0, span, by = bw)[-1]), seq(0, span, by = bw))
# the stuff you were doing before
data.span <- data[abs(runvar) <= span, ]
data.span <- data.span[ , bins := cut(runvar,
breaks,
include.lowest = TRUE, right = FALSE)]
data.span.plot <- data.span[ , list(avg.outcome = mean(outcome),
avg.runvar = mean(runvar),
treated = max(treatment),
n.iid = length(outcome)), keyby = bins]
# note I've removed trying to add `runvar` column to data.span.plot....)
bp <- ggplot(data = data.span.plot, aes(x = avg.runvar, y = avg.outcome))
bp <- bp + geom_point(aes(colour = n.iid))
bp <- bp + stat_smooth(data = data.span, aes(x = runvar, y = outcome,
group = treatment), ...)
bp
}
rddplot(pp.inc, has.di.rec.pp, m.dist.km2, treated, 50, 5)
Note that if you didn't convert to data.table within the function, and assumed the data argument was a data.table, then you could use on.exit() to revert the names changed by reference.
Upvotes: 2