Reputation: 6874
I have looked at several places but I just can't figure out how to do this. It looks like it has changed a few times so even more confusing
I want to summarise the NumOfBx by Endoscopist as part of a function. I have the following dataframe
vv <- structure(list(Endoscopist = c("John Boy ", "Jupi Ter ", "Jupi Ter ",
"John Boy ", "John Boy ", "John Boy ", "Mar Gret ", "John Boy ",
"Mar Gret ", "Phil Ip ", "Phil Ip "), NumbOfBx = c(2, 4, NA,
2, 12, 12, NA, NA, NA, 3, NA)), row.names = 100:110, .Names = c("Endoscopist",
"NumbOfBx"), class = "data.frame")
My function is:
NumBx <- function(x, y, z) {
x <- data.frame(x)
x <- x[!is.na(x[,y]), ]
NumBxPlot <- x %>% group_by_(z) %>% summarise(avg = mean(y, na.rm = T))
}
which I call with:
NumBx(vv,"Endoscopist","NumOfBx)
This gives me the error:
Warning messages:
1: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
2: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
3: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
I changed the function to use summarise_
but I get the same thing. Then I realised the need for summarise_
specifically (as opposed to group_by_
) needing a standard evaluations and I tried this (from this stackoverflow example)
library(lazyeval)
NumBx <- function(x, y, z) {
x <- data.frame(x)
x <- x[!is.na(x[,y]), ]
NumBxPlot <- x %>% group_by_(z) %>%
summarise_(sum_val = interp(~mean(y, na.rm = TRUE), var = as.name(y)))
but I still get the same error of:
Warning messages:
1: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
2: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
3: In mean.default(y, na.rm = T) :
argument is not numeric or logical: returning NA
My intended output is:
Endoscopist Avg
Jupi Ter 4
John Boy 28
Phil Ip 3
Upvotes: 3
Views: 360
Reputation: 43334
Using rlang (the replacement for lazyeval), you could do
library(dplyr)
vv <- structure(list(Endoscopist = c("John Boy ", "Jupi Ter ", "Jupi Ter ", "John Boy ", "John Boy ", "John Boy ", "Mar Gret ", "John Boy ", "Mar Gret ", "Phil Ip ", "Phil Ip "),
NumbOfBx = c(2, 4, NA, 2, 12, 12, NA, NA, NA, 3, NA)),
row.names = 100:110, .Names = c("Endoscopist", "NumbOfBx"), class = "data.frame")
num_bx <- function(.data, group, variable) {
group <- enquo(group)
variable <- enquo(variable)
.data %>%
tidyr::drop_na(!!variable) %>%
group_by(!!group) %>%
summarise(avg = mean(!!variable))
}
vv %>% num_bx(Endoscopist, NumbOfBx)
#> # A tibble: 3 x 2
#> Endoscopist avg
#> <chr> <dbl>
#> 1 John Boy 7
#> 2 Jupi Ter 4
#> 3 Phil Ip 3
or if you want to keep it as strings instead of unquoted names,
num_bx <- function(.data, group, variable) {
group <- rlang::sym(group)
variable <- rlang::sym(variable)
.data %>%
tidyr::drop_na(!!variable) %>%
group_by(!!group) %>%
summarise(avg = mean(!!variable))
}
vv %>% num_bx("Endoscopist", "NumbOfBx")
#> # A tibble: 3 x 2
#> Endoscopist avg
#> <chr> <dbl>
#> 1 John Boy 7
#> 2 Jupi Ter 4
#> 3 Phil Ip 3
Upvotes: 2
Reputation: 13691
Following the dplyr programming vignette, define your function as follows:
NumBx <- function( x, y, z )
{
yy <- enquo( y )
zz <- enquo( z )
data.frame(x) %>% filter( !is.na(!!yy) ) %>% group_by( !!zz ) %>%
summarize( avg = mean(!!yy) )
}
You can now call it as:
NumBx( vv, NumbOfBx, Endoscopist )
# Endoscopist avg
# <chr> <dbl>
# 1 John Boy 7
# 2 Jupi Ter 4
# 3 Phil Ip 3
Some notes:
z
, but you were passing NumbOfBx
as the z
argument.na.rm=TRUE
is redundant. You are already filtering out the rows, where the y
variable is NA.John Boy
should be 7
, not 28
(the value stated in your intended output).Upvotes: 1