Reputation: 409
I'm probably doing something stupid here but would appreciate some help. I'm trying to classify some data that has been incorrectly filled in.
df <- data.frame(ID = c("A", "A", "A","A", "A", "B", "B", "B", "B", "B"),
headache_y_n = c("Yes", "Yes", "Yes", "No", "Yes", "No", "No", "No", "Yes", "No"),
headache_days =c("2", "2", "2", "2", "2", "1", "1", "1", "1", "1"))
I want to be able to say, if headache_y_n is yes more than 3 times, per ID, then it meets criteria of "prolonged", else it should be "short".
Therefore, I want the following output:
output <- data.frame(ID = c("A", "A", "A","A", "A", "B", "B", "B", "B", "B"),
headache_y_n = c("Yes", "Yes", "Yes", "No", "Yes", "No", "No", "No", "Yes", "No"),
headache_days =c("2", "2", "2", "2", "2", "1", "1", "1", "1", "1"),
criteria =c("prolonged", "prolonged", "prolonged", "prolonged", "prolonged", "short", "short", "short", "short", "short"))
My code is as follows:
library(dplyr)
df %>% group_by(ID) %>% mutate(criteria=case_when(
sum(any(headache_y_n=="Yes") >= 3) ~ "prolonged",
TRUE ~ "short"
))
Unfortunately it doesn't work and I get the following error:
Error: Problem with `mutate()` input `criteria`.
x LHS of case 1 (`sum(any(headache_y_n == "Yes") >= 3)`) must be a logical vector, not an integer vector.
ℹ Input `criteria` is `case_when(...)`.
ℹ The error occurred in group 1: ID = "A".
I'm not smart enough to figure out where I'm going wrong, hence why I'm asking kindly for your help!
Thanks!
Upvotes: 2
Views: 827
Reputation: 887511
The any
and sum
should be switched i.e. after grouping by 'ID', we are counting the number of 'Yes' i.e. the sum
of logical expression (headache_y_n == 'Yes'
), then create a second expression after the sum
>=3
, wrap it with any
to match (probably not needed here as the sum
is only a single value)
library(dplyr)
df %>%
group_by(ID) %>%
mutate(criteria=case_when(
any(sum(headache_y_n=="Yes") >= 3) ~ "prolonged",
TRUE ~ "short"
))
i.e. even if remove the any
, it returns the same
df %>%
group_by(ID) %>%
mutate(criteria=case_when(
sum(headache_y_n=="Yes") >= 3 ~ "prolonged",
TRUE ~ "short"
))
# A tibble: 10 x 4
# Groups: ID [2]
# ID headache_y_n headache_days criteria
# <chr> <chr> <chr> <chr>
# 1 A Yes 2 prolonged
# 2 A Yes 2 prolonged
# 3 A Yes 2 prolonged
# 4 A No 2 prolonged
# 5 A Yes 2 prolonged
# 6 B No 1 short
# 7 B No 1 short
# 8 B No 1 short
# 9 B Yes 1 short
#10 B No 1 short
Upvotes: 2