Reputation: 2097
I have the two simple dataframes below. I would like to use dplyr and the tidyverse to find the categories in "Task2" of the second dataframe (Df2) that are not in "Task" of the first dataframe (Df). I would like to use dplyr's "setdiff" function for this. Also, I would like to keep the corresponding time from the "Time" column of the second dataframe (Df2).
Therefore, the end product should include two rows, one for "Iron shirt" for client "Chris", with a total time of 30, and one row for client "Eric", with "Buy groceries", and the corresponding time of 8.
I would also like to drop the date column.
I'm thinking one way to do this would be to use dplyr's "setdiff" function (I realize the Task and Task2 column names will have to be changed so they match) to separate out the two rows, then rejoin the total time with a join function.
Finally, I would like this to be a custom function since I will have to do this task repeatedly. I would like a function like "Differences(Df1,Df2)"...so I can enter in the two dataframes, and get the result.
I hope this isn't asking too much! I'm new to custom functions, especially functions that incorporate dplyr and piping.
Hope someone can help me out!
CaseWorker<-c("John","John","Kim")
Client<-c("Chris","Chris","Eric")
Task<-c("Feed cat","Make dinner","Do homework")
Date<-c("10/27/2016","09/22/2016","10/11/2016")
Df<-data.frame(CaseWorker,Client,Date,Task)
Second dataframe...
CaseWorker<-c("John","John","John","John","John","John","John","John","John",
"John","Kim","Kim","Kim")
Client<-c("Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Eric","Eric","Eric")
Date<-c("11/10/2016","10/10/2016","11/13/2016","09/18/2016","11/11/2016","09/19/2016","08/08/2016","10/10/2016","08/05/2016","11/12/2016","09/09/2016","11/11/2016","09/10/2016")
Task2<-c("Feed cat","Feed cat","Feed cat","Feed cat","Feed cat","Make dinner","Make dinner","Make dinner","Iron shirt","Iron shirt","Do homework",
"Do homework","Buy groceries")
Time<-c(20,34,11,10,5,6,55,30,20,10,12,10,8)
Df2<-data.frame(CaseWorker,Client,Date,Task2,Time)
Upvotes: 1
Views: 237
Reputation: 887028
We can use anti_join
library(dplyr)
anti_join(Df2, Df, by = c("Task2"="Task")) %>%
group_by(CaseWorker,Client, Task2) %>%
summarise(Time = sum(Time))
# CaseWorker Client Task2 Time
# <fctr> <fctr> <fctr> <dbl>
#1 John Chris Iron shirt 30
#2 Kim Eric Buy groceries 8
If we need to convert to a function
DiffGoals <- function(dat1, dat2) {
anti_join(dat1, dat2, by = c("Task2" = "Task")) %>%
group_by(CaseWorker, Client, Task2) %>%
summarise(Time = sum(Time))
}
DiffGoals(Df2, Df)
Upvotes: 1