Reputation: 2097
Using the example below, I want to group the dataframe by CaseWorker, then Client, then determine for each Client group whether the list of tasks in "Task" is the same as the list of tasks in "Task2".
I would be happy witha simple true or false, or better yet, if each task that is in "Task2" but not "Task" could be extracted and displayed in a new column or dataframe.
So basically I need to make sure "Task" and "Task2" contain the same entries for each individual Client.
I would like to stick with Dplyr and Stringr if possible, or at least stay within the Tidyverse. I'm thinking there's some way of using "group_by" and "str_detect" or some other Stringr functionality to achieve this in an elegant manner.
CaseWorker<-c("John","John","John","John","John","John","Melanie","Melanie","Melanie","Melanie","Melanie","Melanie")
Client<-c("Chris","Chris","Chris","Tom","Tom","Tom","Valerie","Valerie","Valerie","Tim","Tim","Tim")
Task<-c("Feed cat","Make dinner","Iron shirt","Make dinner","Do homework","Make lunch","Make dinner","Feed cat","Buy groceries","Do homework","Iron shirt","Make lunch")
Task2<-c("Feed cat","Make dinner","Iron shirt","Make dinner","Do homework","Feed cat","Make dinner","Feed cat","Iron shirt","Do homework","Iron shirt","Make lunch")
Df<-data.frame(CaseWorker,Client,Task,Task2)
Upvotes: 0
Views: 321
Reputation: 2722
This might just be me misinterpreting the question, but I think you might be over-complicating this in the event that what you want is simply the records where Task does not match Task2.
> Df[which(Df$Task != Df$Task2),]
=== ========== ======= ============= ==========
\ CaseWorker Client Task Task2
=== ========== ======= ============= ==========
6 John Tom Make lunch Feed cat
9 Melanie Valerie Buy groceries Iron shirt
=== ========== ======= ============= ==========
Upvotes: 0
Reputation: 368
If you would like to use stringr package. The below could also work for you.
Df %>%
group_by(CaseWorker,Client) %>%
mutate(Check=str_detect(as.character(Task),as.character(Task2))
Upvotes: 0
Reputation: 8072
You can do this simply by dplyr
and using %in%
Df %>%
group_by(CaseWorker,Client) %>%
mutate(Check = Task %in% Task2)
This hinges on exact case matching, if you're worried about that you could the following:
Df %>%
group_by(CaseWorker,Client) %>%
rowwise() %>%
mutate(Check = grepl(Task, Task2, ignore.case = TRUE))
but you have to use rowwise prior to the mutate to work around the vectorized nature of grepl (or most R functions)
Upvotes: 1
Reputation: 2424
See if this is what you're after.
First, see if Task
matches Task2
. If not, return Task2
as a new variable. I stored this into a new data frame df2
df2 <- Df %>%
mutate(match = Task == Task2,
non_match = ifelse(!match, Task2, ""))
df2
# CaseWorker Client Task Task2 match non_match
# 1 John Chris Feed cat Feed cat TRUE
# 2 John Chris Make dinner Make dinner TRUE
# 3 John Chris Iron shirt Iron shirt TRUE
# 4 John Tom Make dinner Make dinner TRUE
# 5 John Tom Do homework Do homework TRUE
# 6 John Tom Make lunch Feed cat FALSE Feed cat
# 7 Melanie Valerie Make dinner Make dinner TRUE
# 8 Melanie Valerie Feed cat Feed cat TRUE
# 9 Melanie Valerie Buy groceries Iron shirt FALSE Iron shirt
# 10 Melanie Tim Do homework Do homework TRUE
# 11 Melanie Tim Iron shirt Iron shirt TRUE
# 12 Melanie Tim Make lunch Make lunch TRUE
Then summarise
the results to see if individual CaseWorker
/Client
pairs match for all entries.
df2 %>%
group_by(CaseWorker, Client) %>%
summarise(n = n(),
matches = sum(match),
all_match = n == matches)
# CaseWorker Client n matches all_match
# <chr> <chr> <int> <int> <lgl>
# 1 John Chris 3 3 TRUE
# 2 John Tom 3 2 FALSE
# 3 Melanie Tim 3 3 TRUE
# 4 Melanie Valerie 3 2 FALSE
You could then of course merge this back into your data frame if you needed the all_match
variable in your original dataset.
Upvotes: 2