What are the advantages of using with() vs. calling vectors?

Question

I am curious if there are any advantages of using with() rather than calling the vector name (aside from using fewer key strokes)?

For example, is with(d,x1) always equivalent to d$x1?

where d is

structure(list(x1 = c(-1.96300839219158, -1.7799470435444, -0.247433477421076, 
-0.333402872895705, -1.37145403620246, -0.23484024054114, -0.808080155419075, 
-0.359895157796401, 0.54316873679816, -0.687429214935226), x2 = c(-0.619089899920824, 
-0.0716448494478719, -0.136643798928645, 2.58777656543295, 0.758900617148999, 
0.687980864291582, 0.442931351818574, -0.734342463692198, 2.55862689249189, 
1.30677108261702)), .Names = c("x1", "x2"), row.names = c(NA, 
-10L), class = "data.frame")

Alex A. · Accepted Answer

If you're just referencing an item in a list, e.g. a column in a data frame, then d$x1 and with(d, x1) will both return x1 from d. However, on its own the latter is rather unusual that's not really the intended purpose of with(); extracting a value from a list is what $ is for.

The advantage of using with() is to evaluate expressions in the context of a single environment without worring about global variables or attached data frames making references to variables ambiguous.

The $ syntax does not support expressions, so to perform a calculation involving multiple variables in a data frame, you would need to use d$x1, d$x2, etc. which is inconvenient. But for otherwise simply extracting an item from a list, $ is preferred.

A notable case in which the two methods are not equivalent is as follows. Suppose d is defined as

d <- data.frame(x1=c(1, 2, 3))

Now define y <- "x1". What happens when we try to reference x1 using y?

> d$y
NULL

> with(d, y)
[1] "x1"

> d[, y]
[1] 1 2 3

d$y returns NULL since there is no column y in d, so there's nothing to extract.

And since there's no column y in d, with(d, y) looks for y in the parent frame of d, which in this case is the global environment. So this evaluates y in the global environment and thus returns "x1". Even though there's nothing to extract, there is something to evaluate because y does exist, just not in d.

Now d[, y] gets us what we want. This first evaluates y, which turns this into d[, "x1"], which is the correct syntax for extracting x1 from d using another variable.

Some finer detail courtesy of David Arenburg:

Note that with() is actually a generic function that performs method dispatch, whereas $ is a primitive. An inspection of base:::with.default is illuminating:

function(data, expr, ...)
eval(substitute(expr), data, enclos = parent.frame())

This serves to confirm that with() is for evaluation.

Since $ is a primitive, it calls .Primitive("$"), which means that it calls an entry point in compiled internal code. Doing a bit of hunting shows that $ goes to an entry point called do_subset3 in subset.c. The comment immediately preceding that piece of C code is equally illuminating:

/* The $ subset operator.
   We need to be sure to only evaluate the first argument.
   The second will be a symbol that needs to be matched, not evaluated.
*/

This serves to confirm that $ is for extraction, not evaluation.

So in short, as David put it so well in a comment, with() and $ have different purposes which in certain circumstances can overlap.

What are the advantages of using with() vs. calling vectors?

Answers (1)

Related Questions