Reputation: 97
I am trying to write a code that compares two columns in the same dataframe, using the summarise to create a new column that will state if the ID was registered before the Review took place.
This is my dataframe:
tt <- structure(list(ID = c("P40", "P40", "P40", "P42", "P42", "P43", "P43",
"P44", "P44"),Type = c("Pre-Initial", "Review", "Review", "Initial", "Review", "Initial", "Review", "Pre-Initial", "Review"),
Registered = c("Yes", "", "", "No", "", "Yes", "", "No", "")),
class = "data.frame", row.names = c(NA, -9L))
Result I want to achieve:
ID Outcome
P40 Yes
P42 No
P43 Yes
P44 No
This is the code I have tried but only shows No for all IDs
tt %>% group_by(ID) %>%
summarise(outcome = c("No", "Yes")[all(Registered == "Yes" & Type == "Review") + 1])
Upvotes: 0
Views: 102
Reputation: 14774
Could try:
tt %>%
group_by(ID) %>%
summarise(
Outcome = c("No", "Yes")[any(Type == "Review" & cumsum(Registered == "Yes") == 1) + 1]
)
Output:
# A tibble: 4 x 2
ID Outcome
<chr> <chr>
1 P40 Yes
2 P42 No
3 P43 Yes
4 P44 No
Note that this assumes the Yes
for Registered
occurs exactly once per each ID
. Otherwise just replace cumsum(Registered == "Yes") == 1
with cumsum(Registered == "Yes") >= 1
.
Upvotes: 2
Reputation: 389235
Another dplyr
variant, here it returns "No"
if there is no value in Registered
as "Yes"
or it compares the index of occurrence of it and "Review"
and assign values accordingly.
library(dplyr)
tt %>%
group_by(ID) %>%
summarise(Outcome = if (any(Registered == "Yes"))
c("No", "Yes")[(which.max(Registered == "Yes") <
which.max(Type == "Review"))+1] else "No")
# ID Outcome
# <chr> <chr>
#1 P40 Yes
#2 P42 No
#3 P43 Yes
#4 P44 No
Upvotes: 2
Reputation: 546053
I’m not exactly sure what your intended result is but from your description it sounds as if the Type == 'Review'
rows are simply irrelevant: All you need is to remove them, and then remove that column (and rename the Registered
column):
tt %>%
filter(Type != 'Review') %>%
select(- Type, Outcome = Registered)
Upvotes: 0