Reputation: 4279
I have two very similar data frames, which ggplot2
sees differently; although the contents are the same the data structures are subtly different. One is a data.frame
, the other a data_frame
. I'd like to understand the difference in how ggplot2
sees them. In the following examples, both are being used in a stat_function
; the data.frame
produces plots while the data_frame
produces errors. This is particularly confusing in light of the interoperability of packages in the Hadleyverse. I first ran into this issue when I found that I was unable to create a plot from a data frame produced by dplyr (dplyr turns data.frames into data_frames) while a data frame I thought was identical (it wasn't, it was a data.frame) worked just fine.
Example 1
First, the working version from the data.frame
.
library(ggplot2)
library(dplyr)
d.f <- data.frame(mean = 0, sd = 1)
d_f <- data_frame(mean = 0, sd = 1)
ggplot(data.frame(x=-3:3), aes(x)) +
stat_function(fun = function (x) dnorm(x, mean = d.f[1,1], sd = d.f[1,2]))
And now the non-working version from the data_frame
.
ggplot(data.frame(x=-3:3), aes(x)) +
stat_function(fun = function (x) dnorm(x, mean = d_f[1,1], sd = d_f[1,2]))
## Warning message:
## Computation failed in `stat_function()`:
## Non-numeric argument to mathematical function
Example 2
This example produces a different error message though perhaps the underlying issue is the same. First, the working version with a data.frame
.
logistic <- function (x) { 1/(1 + exp(-x)) }
d.f <- data.frame(b0 = -9, b1 = 0.8)
d_f <- data_frame(b0 = -9, b1 = 0.8)
ggplot(data.frame(x=0:20), aes(x)) +
stat_function(fun = function (x) logistic(d.f[1,1] + d.f[1,2] * x))
And here's the non-working version with a data_frame
.
ggplot(data.frame(x=0:20), aes(x)) +
stat_function(fun = function (x) logistic(d_f[1,1] + d_f[1,2] * x))
## Error in eval(expr, envir, enclos) : object 'y' not found
Upvotes: 2
Views: 457
Reputation: 4279
ggplot
was seeing a data frame where it expected a value.
This resulted from differences between the data types returned by the subsetting square-bracket operator applied when applied to a data.frame
or a tibble
(the data frame preferred by Hadley's dplyr
). Subsetting a data.frame
can change types by default, e.g. returning a vector or value. Subsetting a tibble
will return a tibble
unless the user requests re-casting explicitly, e.g. by using pull
or double-brackets [[]]
. The error message "Non-numeric argument to mathematical function" should have been a clue.
The following code demonstrates this by appropriately re-casting the tibble
s.
library(ggplot2)
library(dplyr)
d.f <- data.frame(mean = 0, sd = 1)
d_f <- data_frame(mean = 0, sd = 1)
Subsetting a tibble
(aka tbl_df
) returns a tbl_df
.
class(d_f[1,1])
## [1] "tbl_df" "tbl" "data.frame"
Which can be re-cast with double square-brackets [[]]
or pull
.
class(d_f[[1,1]])
## [1] "numeric"
class(pull(d_f[1,1]))
## [1] "numeric"
Subsetting a data.frame
returns a numeric vector.
class(d.f[1,1])
## [1] "numeric"
The behavior of subsetting a tibble
, i.e. no re-casting, can be produced with the argument drop=FALSE
.
class(d.f[1,1, drop=FALSE])
## [1] "data.frame"
Finally, showing that resolving the type issue resolves the plotting issue ...
ggplot(data.frame(x=-3:3), aes(x)) +
stat_function(fun = function (x) dnorm(x, mean = pull(d_f[1,1]), sd = pull(d_f[1,2])))
and
ggplot(data.frame(x=-3:3), aes(x)) +
stat_function(fun = function (x) dnorm(x, mean = d_f[[1,1]], sd = d_f[[1,2]]))
both produce the expected plot.
Upvotes: 3