Reputation: 487
I am using ddply
(from the plyr
package in R
) inside a wrap function. I want to summarize my dataset based on the value of a variable. However, the wrap function has to define for which variable I want summarize.
Without a wrap function, I can take the following approach:
require(plyr)
# Create sample dataframe:
sample_df <- data.frame(a = rep(1:3, 2), b = rep(3:1, 2), c = rep(c("a", "b"), 3))
sample_df
a b c
1 1 3 a
2 2 2 b
3 3 1 a
4 1 3 b
5 2 2 a
6 3 1 b
# Use ddply to summarize the dataframe:
ddply(sample_df, .(a), summarize, mean = mean(b), var = var(b))
a mean var
1 1 3 0
2 2 2 0
3 3 1 0
However, using a wrap function, I don't get the same results:
sumfun <- function(df, v) { # summarize a given dataframe by a given variable,
d <- ddply(df, .(v), summarize, mean = mean(b), var = var(b))
return(d)
}
# Output using the function:
sumfun(sample_df, "a")
v mean var
1 a 3 NA
Why does the behavior of ddply
differ when using it in a function? I have tried using substitute(v)
and eval(substitute(v))
inside the function, but it doesn't make a difference.
Upvotes: 1
Views: 252
Reputation: 13680
The plyr
package and its ddply
function are kind of outdated and evolved into the dplyr
, tidyr
and similar packages (referenced as tidyverse
).
# library(tidyverse)
library(dplyr)
What you are trying to accomplish can be translated like this:
sample_df %>%
group_by(a) %>%
summarize(mean = mean(b), var = var(b))
# # A tibble: 3 × 3
# a mean var
# <int> <dbl> <dbl>
# 1 1 3 0
# 2 2 2 0
# 3 3 1 0
And, for the function approach:
sumfun <- function(df, v) {
df %>%
group_by_(v) %>%
summarize(mean = mean(b), var = var(b))
}
sumfun(sample_df, 'a')
# # A tibble: 3 × 3
# a mean var
# <int> <dbl> <dbl>
# 1 1 3 0
# 2 2 2 0
# 3 3 1 0
Note the final _
in group_by_
present in function needed to do standard evaluation. See vignette("nse")
for details.
Upvotes: 2