Hank Lin
Hank Lin

Reputation: 6479

What does a tilde (~) in front of a single variable mean (facet_wrap)?

I am going through Hadley Wickham's "R for Data Science" where he uses ~var in ggplot calls.

I understand y ~ a + bx, where ~ describes a formula/relationship between dependent and independent variables, but what does ~var mean? More importantly, why can't you just put the variable itself? See code below:

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

or

demo <- tribble(
  ~cut,         ~freq,
  "Fair",       1610,
  "Good",       4906,
  "Very Good",  12082,
  "Premium",    13791,
  "Ideal",      21551
)

ggplot(data = demo) +
  geom_bar(mapping = aes(x = cut, y = freq), stat = "identity")

Upvotes: 11

Views: 5019

Answers (3)

vorpal
vorpal

Reputation: 318

To understand the why part of your question, look at how the tilde is used in plotting.

lattice::xyplot(mpg ~ disp, data=mtcars)

This gives disp as the x axis (independent variable) and mpg as the y axis (dependent variable). By analogy, facet_wrap() is taking the RHS of the ~ as the columns to facet by (ie. the horizontal/x/independent variable) and the LHS as the rows to facet by (vertical/y/dependent variable). If you only give a RHS of the ~, you are only giving the columns (as noted above this is equivalent to facet_grid(col = vars(var))).

Upvotes: 0

Calum You
Calum You

Reputation: 15062

It is a syntax specific to facet_wrap, where a formula can be given as the input for the variable relationships. From the documentation for the first argument, facets:

A set of variables or expressions quoted by vars() and defining faceting groups on the rows or columns dimension. The variables can be named (the names are passed to labeller). For compatibility with the classic interface, can also be a formula or character vector. Use either a one sided formula, '~a b, or a character vector,c("a", "b")'.

So I think you can now just give the variable names without the tilde, but you used to need to give a one-sided formula with the tilde.

Upvotes: 5

divibisan
divibisan

Reputation: 12155

It's just ggplot making use of the formula structure to let the user decide what variables to facet on. From ?facet_grid:

For compatibility with the classic interface, rows can also be a formula with the rows (of the tabular display) on the LHS and the columns (of the tabular display) on the RHS; the dot in the formula is used to indicate there should be no faceting on this dimension (either row or column).

So facet_grid(. ~ var) just means to facet the grid on the variable var, with the facets spread over columns. It's the same as facet_grid(col = vars(var)).

Despite looking like a formula, it's not really being used as a formula: it's just a way to present multiple arguments to R in a way that the facet_grid code can clearly and unambiguously interpret.

Upvotes: 5

Related Questions