Annerose N
Annerose N

Reputation: 487

Using ddply inside function (non-standard evaluation)

I am using ddply (from the plyr package in R) inside a wrap function. I want to summarize my dataset based on the value of a variable. However, the wrap function has to define for which variable I want summarize.

Without a wrap function, I can take the following approach:

require(plyr)

# Create sample dataframe:
sample_df <- data.frame(a = rep(1:3, 2), b = rep(3:1, 2), c = rep(c("a", "b"), 3))

sample_df
  a b c
1 1 3 a
2 2 2 b
3 3 1 a
4 1 3 b
5 2 2 a
6 3 1 b

# Use ddply to summarize the dataframe:
ddply(sample_df, .(a), summarize, mean = mean(b), var = var(b))
  a mean var
1 1    3   0
2 2    2   0
3 3    1   0

However, using a wrap function, I don't get the same results:

sumfun <- function(df, v) { # summarize a given dataframe by a given variable, 
  d <- ddply(df, .(v), summarize, mean = mean(b), var = var(b))
  return(d)
}

# Output using the function:
sumfun(sample_df, "a")
  v mean var
1 a    3  NA

Why does the behavior of ddply differ when using it in a function? I have tried using substitute(v) and eval(substitute(v)) inside the function, but it doesn't make a difference.

Upvotes: 1

Views: 252

Answers (1)

GGamba
GGamba

Reputation: 13680

The plyr package and its ddply function are kind of outdated and evolved into the dplyr, tidyr and similar packages (referenced as tidyverse).

# library(tidyverse)
library(dplyr)

What you are trying to accomplish can be translated like this:

sample_df %>% 
    group_by(a) %>% 
    summarize(mean = mean(b), var = var(b))
# # A tibble: 3 × 3
#       a  mean   var
#   <int> <dbl> <dbl>
# 1     1     3     0
# 2     2     2     0
# 3     3     1     0

And, for the function approach:

sumfun <- function(df, v) {
    df %>% 
        group_by_(v) %>% 
        summarize(mean = mean(b), var = var(b))
}

sumfun(sample_df, 'a')
# # A tibble: 3 × 3
#       a  mean   var
#   <int> <dbl> <dbl>
# 1     1     3     0
# 2     2     2     0
# 3     3     1     0

Note the final _ in group_by_ present in function needed to do standard evaluation. See vignette("nse") for details.

Upvotes: 2

Related Questions