Reputation: 69
I don't have experience on functions in R. I'm trying to build one that calculates the mean by a target variable (in my example: funded_final).
My data:
residential_status funded_final
Living with parents 0
Rent 0
Rent 0
Own 1
Own 0
Own 0
Rent 0
Rent 0
Rent 0
Living with parents 0
Rent 0
Rent 0
Rent 1
When I do this outside the function works great
test2 %>% group_by(residential_status) %>%
summarise(tar_average = round((mean(funded_final, na.rm=TRUE))*100,2),N = n()) %>% arrange(desc(tar_average)) %>% mutate(Perc = round((N/sum(N))*100,2),Cum_Perc = cumsum(Perc))%>% print(n = nrow(.))
The results:
residential_status tar_average N Perc Cum_Perc
<fctr> <dbl> <int> <dbl> <dbl>
1 Own 33.33 3 23.08 23.08
2 Rent 12.50 8 61.54 84.62
3 Living with parents 0.00 2 15.38 100.00
When I use the function, I just get the total average:
group.by.func <- function(dataframe,target){ dataframe %>%group_by(residential_status) %>%
summarise(tar_average = round((mean(target, na.rm=TRUE))*100,2),N = n()) %>%
arrange(desc(tar_average)) %>%
mutate(Perc = round((N/sum(N))*100,2),Cum_Perc = cumsum(Perc))%>%
print(n = nrow(.))}
group.by.func(test2,test2$funded_final)
Results:
residential_status tar_average N Perc Cum_Perc
<fctr> <dbl> <int> <dbl> <dbl>
1 Living with parents 15.38 2 15.38 15.38
2 Own 15.38 3 23.08 38.46
3 Rent 15.38 8 61.54 100.00
Thanks in advance!
Upvotes: 1
Views: 821
Reputation: 1790
The problem is that dplyr::summarise
uses non-standard evaluation and expects the names of the columns as unquoted strings. In your case, the variable target
is not a column name but a vector containing the values of the column. The function has no way of associating the vector with the data.frame. Therefore, the grouping does not apply to the vector target
. In each evaluation of the grouped data.frame, the mean is taken over the entire vector target
.
You could solve it by passing the column name as a string and using the 'standard evaluation' version of dplyr::summarise
:
group.by.func <- function(dataframe, target){
dataframe %>% group_by(residential_status) %>%
summarise_(.dots = list(
tar_average = paste0("round((mean(", target,", na.rm=TRUE))*100,2)"),
N = "n()")) %>%
arrange(desc(tar_average)) %>%
mutate(Perc = round((N/sum(N))*100,2),Cum_Perc = cumsum(Perc))%>%
print(n = nrow(.))
}
group.by.func(test2,"funded_final")
Results:
# A tibble: 3 × 5
residential_status tar_average N Perc Cum_Perc
<fctr> <dbl> <int> <dbl> <dbl>
1 Own 33.33 3 23.08 23.08
2 Rent 12.50 8 61.54 84.62
3 Living with parents 0.00 2 15.38 100.00
Upvotes: 1