Reputation: 1
To simplify, say, I have a dataset like this:
num = c(1,2,3,"NA",3,4,1,2,1)
char = c('a','b','s','s','s','s','a','s','s')
t = as.data.frame(cbind(num,char))
and I wrote a function to find top 5 values of each column:
func_top5 = function(x){t%>%
filter(!is.na(x))%>%
group_by(x)%>%
summarise(number_of_same_value = n())%>%
arrange(desc(number_of_same_value))%>%
slice(1:5)}
when I tried to apply this function to the df,
apply(t,2,func_top5)
it returned the error:
Error in grouped_df_impl(data, unname(vars), drop) :
Column x
is unknown
But when I just use the function separately, it works totally fine:
t%>%
filter(!is.na(num))%>%
group_by(num)%>%
summarise(number_of_same_value = n())%>%
arrange(desc(number_of_same_value))%>%
slice(1:5)
# A tibble: 5 x 2
num number_of_same_value
<fctr> <int>
1 1 3
2 2 2
3 3 2
4 4 1
5 NA 1
I think the problem might be the "group_by" function.
Can anyone help me with this?
Upvotes: 0
Views: 852
Reputation: 886948
We can use the quosure way to solve this. Assuming that they input argument 'x' is not quoted, we can convert it to quosure with enquo
, then evaluate within the group_by
, filter
using bang-bang operator(!!
). Note that, it is better to have the dataset object also as the input argument for useability of the function in a more general way. It is not clear whether the missing values are quoted or not. The more acceptable way if it is a true NA is is.na
func_top5 <- function(df, x){
x <- enquo(x)
df %>%
filter(! (!!(x) %in% c("NA", "")))%>%
group_by(!! x)%>%
summarise(number_of_same_value = n())%>%
arrange(desc(number_of_same_value))%>%
slice(1:5)
}
We call it by
func_top5(df1, col1)
# A tibble: 2 x 2
# col1 number_of_same_value
# <chr> <int>
#1 b 3
#2 a 2
One option to do this on multiple columns would be
map(names(t), ~ func_top5(t1, !! rlang::sym(.x)))
#[[1]]
# A tibble: 5 x 2
# num number_of_same_value
# <dbl> <int>
#1 1.00 3
#2 2.00 2
#3 3.00 2
#4 4.00 1
#5 NA 1
#[[2]]
# A tibble: 3 x 2
# char number_of_same_value
# <chr> <int>
#1 s 6
#2 a 2
#3 b 1
df1 <- data.frame(col1 = c("a", "b", "NA", "", "a", "b", "b"),
col2 = rnorm(7), stringsAsFactors = FALSE)
Upvotes: 1