Jaywalker
Jaywalker

Reputation: 49

dplyr 'object not found' median only

This problem has me stumped.

I have the following data frame:

library(dplyr)

# approximation of data frame
x <- data.frame(doy = sample(c(seq(200, 300)), 20, replace = T),
                year = sample(c("2000", "2005"), 20, replace = T), 
                phase = sample(c("pre", "post"), 20, replace = T))

and a simple 'summarize' function that takes in the column name as a variable, and works nicely:

 getStats <- function(df, col) {
      col <- as.name(col)
      df %>% 
        group_by(year, phase) %>% 
        summarize(n = sum(!is.na(col)), 
                  mean = mean(col, na.rm = T),
                  sd = sd(col, na.rm = T),
                  se = sd/sqrt(n))
 }

> getStats(x, "doy")
Source: local data frame [4 x 6]
Groups: year [?]

    year  phase     n    mean       sd       se
  <fctr> <fctr> <int>   <dbl>    <dbl>    <dbl>
1   2000   post     8 248.625 30.42526 10.75695
2   2000    pre     2 290.000 14.14214 10.00000
3   2005   post     5 231.400 32.86031 14.69558
4   2005    pre     5 274.200 29.79429 13.32441

However, if I modify the function to get the median, it returns an error:

 getStats <- function(df, col) {
      col <- as.name(col)
      df %>% 
        group_by(year, phase) %>% 
        summarize(n = sum(!is.na(col)), 
                  mean = mean(col, na.rm = T),
                  med = median(col, na.rm = T), # new line 
                  sd = sd(col, na.rm = T),
                  se = sd/sqrt(n))
    }

> getStats(x, "doy")

Error in median (doy, na.rm = TRUE): object "doy" not found

I've tried a host of name and position changes, but all yield the same result: 'median' doesn't accept the column name as a passed variable. I assume I'm missing something so basic I'll do a face palm when someone points it out to me, but in the interim I feel like I'm losing my sanity. I appreciate any insights!

Upvotes: 2

Views: 1082

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226087

Your proximal problem may be that median doesn't have a ... argument, while mean does (I'm not sure why sd is working ... maybe an interaction between methods and ...?)

In any case, IMO the right way to handle this sort of problem is to use standard evaluation (i.e., not non-standard evaluation, i.e. use summarise_ rather than summarise, as illustrated in vignette("nse",package="dplyr")):

Illustrating how this works in the global environment rather than inside a function, but I think that shouldn't matter ...

col <- "doy"
funs <- c("n","mean","stats::median","sd","se")
## put together function calls
dots <- c(sprintf("sum(!is.na(%s))",col),
      sprintf("%s(%s,na.rm=TRUE)",funs[2:4],col),
      "sd/sqrt(n)")
names(dots) <- gsub("^.*::","",funs)  ## ugh
dots 
##                              n                            mean 
##              "sum(!is.na(doy))"          "mean(doy,na.rm=TRUE)" 
##                        median                              sd 
## "stats::median(doy,na.rm=TRUE)"            "sd(doy,na.rm=TRUE)" 
##                              se 
##                    "sd/sqrt(n)" 

x %>% 
    group_by(year, phase) %>% 
    summarise_(.dots=dots)

The only annoying thing here is that for some reason dplyr can't find median unless I call it as stats::median, which means we have to work a little harder to get nice column names. The standard-evaluation method is a little uglier, but that's the price you pay for this kind of flexibility.

Embedding this in a function, I would probably break off getStats in a different place, i.e.

 getStats <- function(data,col) {
   ## if you want to pass a string argument instead, remove
   ## the next line
   col <- deparse(substitute(col))
   funs <- c("n","mean","stats::median","sd","se")
   dots <- c(sprintf("sum(!is.na(%s))",col),
      sprintf("%s(%s,na.rm=TRUE)",funs[2:4],col),
      "sd/sqrt(n)")
   names(dots) <- gsub("^.*::","",funs)  ## ugh
   summarise_(data,.dots=dots)
}

x %>% group_by(year,phase) %>% getStats(doy)

This gives you more flexibility to do different groupings ...

Upvotes: 3

Related Questions