Reputation: 3
I'm new to R, and I'm trying to write a function that will add the entries of a data frame column by row, and return the data frame with
Here's a sample df of my data:
Ethnicity <- c('A', 'B', 'H', 'N', 'O', 'W', 'Unknown')
Texas <- c(2,41,56,1,3,89,7)
Tenn <- c(1,9,2,NA,1,32,3)
When I directly try the following code, the columns are summed by row as desired:
new_df <- df %>% rowwise() %>%
mutate(TN_TX = sum(Tenn, Texas, na.rm = TRUE))
But when I try to use my function code, rowwise() seems not to work. My function code is:
df.sum.col <- function(df.in, col.1, col.2) {
if(is.data.frame(df.in) != TRUE){ #warning if first arg not df
warning('df.in is not a dataframe')}
if(is.numeric(col.1) != TRUE){
warning('col.1 is not a numeric vector')}
if(is.numeric(col.2) != TRUE){
warning('col.2 is not a numeric vector')} #warning if col not numeric
df.out <- rowwise(df.in) %>%
mutate(name = sum(col.1, col.2, na.rm = TRUE))
df.out
}
bad_df <- df.sum(df,Texas, Tenn)
This results in
.
I don't understand why the core of the function works outside it but not within. I also tried piping df.in to rowsum() like this:
f.out <- df.in %>% rowwise() %>%
mutate(name = sum(col.1, col.2, na.rm = TRUE))
But that doesn't resolve the problem.
As far as naming the new column, I tried doing so by adding the name as an argument, but didn't have any success. Thoughts on this?
Any help appreciated!
Upvotes: 0
Views: 1992
Reputation: 12640
As suggested by @thelatemail, it's down to non-standard evaluation. rowwise()
ha nothing to do with it. You need to rewrite your function to use mutate_
. It can be tricky to understand, but here's one version of what you're trying to do:
library(dplyr)
df <- tibble::tribble(
~Ethnicity, ~Texas, ~Tenn,
"A", 2, 1,
"B", 41, 9,
"H", 56, 2,
"N", 1, NA,
"O", 3, 1,
"W", 89, 32,
"Unknown", 7, 3
)
df.sum.col <- function(df.in, col.1, col.2, name) {
if(is.data.frame(df.in) != TRUE){ #warning if first arg not df
warning('df.in is not a dataframe')}
if(is.numeric(lazyeval::lazy_eval(substitute(col.1), df.in)) != TRUE){
warning('col.1 is not a numeric vector')}
if(is.numeric(lazyeval::lazy_eval(substitute(col.2), df.in)) != TRUE){
warning('col.2 is not a numeric vector')} #warning if col not numeric
dots <- setNames(list(lazyeval::interp(~sum(x, y, na.rm = TRUE),
x = substitute(col.1), y = substitute(col.2))),
name)
df.out <- rowwise(df.in) %>%
mutate_(.dots = dots)
df.out
}
In practice, you shouldn't need to use rowwise at all here, but can use rowSums
, after selecting only the columns you need to sum.
Upvotes: 1