Reputation: 2150
I have a simple data frame with 3 columns: name, goal, and actual. Because this is a simplification of much larger dataframe, I want to use dplyr to compute the number of times a goal has been met by each person.
df <- data.frame(name = c(rep('Fred', 3), rep('Sally', 4)),
goal = c(4,6,5,7,3,8,5), actual=c(4,5,5,3,3,6,4))
The result should look like this:
I should be able to pass an anonymous function similar to what is shown below, but don't have the syntax quite right:
library(dplyr)
g <- group_by(df, name)
summ <- summarise(g, met_goal = sum((function(x,y) {
if(x>y){return(0)}
else{return(1)}
})(goal, actual)
)
)
When I run the code above, I see 3 of these errors:
Warning messages: 1: In if (x == y) { : the condition has length > 1 and only the first element will be used
Upvotes: 6
Views: 2006
Reputation: 2150
Found myself needing to do something similar to this again (a year later) but with a more complex function than the simple one provided in the original question. The originally accepted answer took advantage of a specific feature of the problem, but the more general approach was touched on here. Using this approach, the answer I was ultimately after was something like this:
library(dplyr)
df <- data.frame(name = c(rep('Fred', 3), rep('Sally', 4)),
goal = c(4,6,5,7,3,8,5), actual=c(4,5,5,3,3,6,4))
my_func = function(act, goa) {
if(act < goa) {
return(0)
} else {
return(1)
}
}
g <- group_by(df, name)
summ = df %>% group_by(name) %>%
summarise(met_goal = sum(mapply(my_func, .data$actual, .data$goal)))
> summ
# A tibble: 2 x 2
name met_goal
<fct> <dbl>
1 Fred 2
2 Sally 1
The original question referred to using an anonymous function. In that spirit, the last part would look like this:
g <- group_by(df, name)
summ = df %>% group_by(name) %>%
summarise(met_goal = sum(mapply(function(act, go) {
if(act < go) {
return(0)
} else {
return(1)
}
}, .data$actual, .data$goal)))
Upvotes: 0
Reputation: 99371
We have equal length vectors in goal
and actual
, so the relational operators are appropriate to use here. However, when we use them in a simple if()
statement we may get unexpected results because if()
expects length 1 vectors. Since we have equal length vectors and we require a binary result, taking the sum of the logical vector is the best approach, as follows.
group_by(df, name) %>%
summarise(met_goal = sum(goal <= actual))
# A tibble: 2 x 2
name met_goal
<fctr> <int>
1 Fred 2
2 Sally 1
The operator is switched to <=
because you want 0
for goal > actual
and 1
otherwise.
Note that you can use an anonymous function. It was the if()
statement that was throwing you off. For example, using
sum((function(x, y) x <= y)(goal, actual))
would work in the manner you are asking about.
Upvotes: 4
Reputation: 28379
Solution using data.table
:
You asked for dplyr
solution, but as actual data is much larger you can use data.table
. foo
is function you want to apply.
foo <- function(x, y) {
res <- 0
if (x <= y) {
res <- 1
}
return(res)
}
library(data.table)
setDT(df)
setkey(df, name)[, foo(goal, actual), .(name, 1:nrow(df))][, sum(V1), name]
If you prefer pipes then you can use this:
library(magrittr)
setDT(df) %>%
setkey(name) %>%
.[, foo(goal, actual), .(name, 1:nrow(.))] %>%
.[, .(met_goal = sum(V1)), name]
name met_goal
1: Fred 2
2: Sally 1
Upvotes: 2