Why does ggplot2 see a data.frame and data_frame differently?

Question

I have two very similar data frames, which ggplot2 sees differently; although the contents are the same the data structures are subtly different. One is a data.frame, the other a data_frame. I'd like to understand the difference in how ggplot2 sees them. In the following examples, both are being used in a stat_function; the data.frame produces plots while the data_frame produces errors. This is particularly confusing in light of the interoperability of packages in the Hadleyverse. I first ran into this issue when I found that I was unable to create a plot from a data frame produced by dplyr (dplyr turns data.frames into data_frames) while a data frame I thought was identical (it wasn't, it was a data.frame) worked just fine.

Example 1

First, the working version from the data.frame.

library(ggplot2)
library(dplyr)

d.f <- data.frame(mean = 0, sd = 1)
d_f <- data_frame(mean = 0, sd = 1)

ggplot(data.frame(x=-3:3), aes(x)) +
  stat_function(fun = function (x) dnorm(x, mean = d.f[1,1], sd = d.f[1,2]))

And now the non-working version from the data_frame.

ggplot(data.frame(x=-3:3), aes(x)) +
  stat_function(fun = function (x) dnorm(x, mean = d_f[1,1], sd = d_f[1,2]))
## Warning message:
## Computation failed in `stat_function()`:
## Non-numeric argument to mathematical function

Example 2

This example produces a different error message though perhaps the underlying issue is the same. First, the working version with a data.frame.

logistic <- function (x) { 1/(1 + exp(-x)) }

d.f <- data.frame(b0 = -9, b1 = 0.8) 
d_f <- data_frame(b0 = -9, b1 = 0.8) 

ggplot(data.frame(x=0:20), aes(x)) +
  stat_function(fun = function (x) logistic(d.f[1,1] + d.f[1,2] * x))

And here's the non-working version with a data_frame.

ggplot(data.frame(x=0:20), aes(x)) +
  stat_function(fun = function (x) logistic(d_f[1,1] + d_f[1,2] * x))
## Error in eval(expr, envir, enclos) : object 'y' not found

Gregory · Accepted Answer

ggplot was seeing a data frame where it expected a value.

This resulted from differences between the data types returned by the subsetting square-bracket operator applied when applied to a data.frame or a tibble (the data frame preferred by Hadley's dplyr). Subsetting a data.frame can change types by default, e.g. returning a vector or value. Subsetting a tibble will return a tibble unless the user requests re-casting explicitly, e.g. by using pull or double-brackets [[]]. The error message "Non-numeric argument to mathematical function" should have been a clue.

The following code demonstrates this by appropriately re-casting the tibbles. library(ggplot2) library(dplyr)

d.f <- data.frame(mean = 0, sd = 1)
d_f <- data_frame(mean = 0, sd = 1)

Subsetting a tibble (aka tbl_df) returns a tbl_df.

class(d_f[1,1])
## [1] "tbl_df"     "tbl"        "data.frame"

Which can be re-cast with double square-brackets [[]] or pull.

class(d_f[[1,1]])
## [1] "numeric"
class(pull(d_f[1,1]))
## [1] "numeric"

Subsetting a data.frame returns a numeric vector.

class(d.f[1,1])
## [1] "numeric"

The behavior of subsetting a tibble, i.e. no re-casting, can be produced with the argument drop=FALSE.

class(d.f[1,1, drop=FALSE])
## [1] "data.frame"

Finally, showing that resolving the type issue resolves the plotting issue ...

ggplot(data.frame(x=-3:3), aes(x)) +
  stat_function(fun = function (x) dnorm(x, mean = pull(d_f[1,1]), sd = pull(d_f[1,2])))

and

ggplot(data.frame(x=-3:3), aes(x)) +
  stat_function(fun = function (x) dnorm(x, mean = d_f[[1,1]], sd = d_f[[1,2]]))

both produce the expected plot.

Why does ggplot2 see a data.frame and data_frame differently?

Answers (1)

Related Questions